Quick guide to writing scanning definitions in JavaCC
Define tokens
The following defines the tokens TOK1 and TOK2:
TOKEN : { < TOK1: ...> }
TOKEN : { < TOK2: ...> }
The following defines the same thing in a shorter way:
TOKEN : { < TOK1: ...> | < TOK2: ...> }
The following says that case (big or small letters) is irrelevant for tokens T1 and T2:
TOKEN [IGNORE_CASE] : { < T1: ...> | < T2: ...> }
Regular expressions
TOKEN : {
< A: (...)* > -- any number of ...
| < B: (...)+ > -- at least one of ...
| < C: (...)? > -- zero or one of ...
| < D: xxx | yyy > -- either xxx or yyy
| < E: "..." > -- matches the text ...
| < F: ["a"-"z","+"] > -- Matches one lower case letter or a plus sign.
| < G: ~["a"-"z", "+"] > -- Matches any character *except* a lower
-- case letter or the plus sign.
| < H: " " | "\t" | "\n" | "\r" > -- Matches a blank or a tab or a newline
-- or a return character
| < I: < DIGIT > | < LETTER > > -- Matches the same as DIGIT or LETTER
| < #DIGIT: ["0"-"9"] > -- The #-sign means that DIGIT and LETTER
| < #LETTER: ["a"-"z", "A"-"Z"] > -- are help definitions that do not
-- produce tokens
| < J: "\"" > -- Matches the quotation character
| < K: "\\" -- Matches the backslash character
}
A, B, ... are the names of the tokens. For SKIP and MORE, names are not given.
E.g., simply write
SKIP : {
" " -- skip blanks
| "\t" -- skip tabs
| "\n" -- skip newlines
| "\r" -- skip returns
| < "//" (~["\n","\r"])* ("\n" | "\r" | "\r\n") > -- skip single-line comments
}
Lexical states
Here is an example. Tokens A1 and A2 are relevant only in the DEFAULT state.
Tokens B1, B2, and B3 are relevant only in STATE1. Token C is relevant in both the DEFAULT and in the STATE1 state.
TOKEN : {
< A1: ... > -- the next state is unchanged (DEFAULT)
| < A2: ... > : STATE1 -- the next state is STATE1
}
< STATE1 > TOKEN : {
< B1: ... > : STATE2 -- the next state is STATE2
| < B2: ... > -- the next state is unchanged (STATE1)
| < B3: ... > : DEFAULT -- the next state is DEFAULT
}
< DEFAULT, STATE1 > TOKEN : {
< C: ... >
}
Last modified: January 5, 2001 / Görel Hedin