Quick guide to writing scanning definitions in JavaCC


Define tokens

The following defines the tokens TOK1 and TOK2:
TOKEN : { < TOK1: ...> }
TOKEN : { < TOK2: ...> }
The following defines the same thing in a shorter way:
TOKEN : { < TOK1: ...> | < TOK2: ...> }
The following says that case (big or small letters) is irrelevant for tokens T1 and T2:
TOKEN [IGNORE_CASE] : { < T1: ...> | < T2: ...> }

Regular expressions

TOKEN : {
  < A: (...)* >      -- any number of ...
| < B: (...)+ >      -- at least one of ...
| < C: (...)? >      -- zero or one of ...
| < D: xxx | yyy >   -- either xxx or yyy
| < E: "..." >       -- matches the text ...
| < F: ["a"-"z","+"] >      -- Matches one lower case letter or a plus sign.
| < G: ~["a"-"z", "+"] >    -- Matches any character *except* a lower
                            -- case letter or the plus sign.
| < H: " " | "\t" | "\n" | "\r" >   -- Matches a blank or a tab or a newline
                                    -- or a return character
| < I: < DIGIT > | < LETTER > >     -- Matches the same as DIGIT or LETTER
| < #DIGIT: ["0"-"9"] >             -- The #-sign means that DIGIT and LETTER 
| < #LETTER: ["a"-"z", "A"-"Z"] >   -- are help definitions that do not
                                    -- produce tokens
| < J: "\"" >                       -- Matches the quotation character
| < K: "\\"                         -- Matches the backslash character
}
A, B, ... are the names of the tokens. For SKIP and MORE, names are not given. E.g., simply write
SKIP : {
  " "     -- skip blanks
| "\t"    -- skip tabs
| "\n"    -- skip newlines
| "\r"    -- skip returns
| < "//" (~["\n","\r"])* ("\n" | "\r" | "\r\n") >  -- skip single-line comments
}

Lexical states

Here is an example. Tokens A1 and A2 are relevant only in the DEFAULT state. Tokens B1, B2, and B3 are relevant only in STATE1. Token C is relevant in both the DEFAULT and in the STATE1 state.
TOKEN : {
  < A1: ... >                     -- the next state is unchanged (DEFAULT)
| < A2: ... > : STATE1            -- the next state is STATE1
}

< STATE1 > TOKEN : {
  < B1: ... > : STATE2            -- the next state is STATE2
| < B2: ... >                     -- the next state is unchanged (STATE1)
| < B3: ... > : DEFAULT           -- the next state is DEFAULT
} 

< DEFAULT, STATE1 > TOKEN : {
  < C: ... >
}

Last modified: January 5, 2001 / Görel Hedin