The group property will cause the lexer to collect Tokens of this type separately from the other Tokens.
For example this could be used to collect comments for post processing.
See: https://github.com/chevrotain/chevrotain/tree/master/examples/lexer/token_groups
The Label is a human readable name to be used in error messages and syntax diagrams.
For example a TokenType may be called LCurly, which is short for "left curly brace". The much easier to understand label could simply be "{".
Can a String matching this Token Type's pattern possibly contain a line terminator? If true and the line_breaks property is not also true this will cause inaccuracies in the Lexer's line / column tracking.
The "longer_alt" property will cause the Lexer to attempt matching against other Token Types every time this Token Type has been matched.
This feature can be useful when two or more Token Types have common prefixes which cannot be resolved (only) by the ordering of the Tokens in the lexer definition.
longer_alt
capability cannot be chained.longer_alt
takes precendence.For example see: https://github.com/chevrotain/chevrotain/tree/master/examples/lexer/keywords_vs_identifiers For resolving the keywords vs Identifier ambiguity.
This defines what sequence of characters would be matched To this TokenType when Lexing.
For Custom Patterns see: http://chevrotain.io/docs/guide/custom_token_patterns.html
If "pop_mode" is true the Lexer will pop the last mode of the modes stack and continue lexing using the new mode at the top of the stack.
A name of a Lexer mode to "enter" once this Token Type has been matched. Lexer modes can be used to support different sets of possible Tokens Types
Lexer Modes work as a stack of Lexers, so "entering" a mode means pushing it to the top of the stack.
See: https://github.com/chevrotain/chevrotain/tree/master/examples/lexer/multi_mode_lexer
Possible starting characters or charCodes of the pattern. These will be used to optimize the Lexer's performance.
These are normally automatically computed, however the option to explicitly specify those can enable optimizations even when the automatic analysis fails.
e.g:
strings hints should be one character long.
{ start_chars_hint: ["a", "b"] }
number hints are the result of running ".charCodeAt(0)" on the strings.
{ start_chars_hint: [97, 98] }
For unicode characters outside the BMP use the first of their surrogate pairs. for example: The '💩' character is represented by surrogate pairs: '\uD83D\uDCA9' and D83D is 55357 in decimal.
Note that "💩".charCodeAt(0) === 55357
Generated using TypeDoc
Categories enable polymorphism on Token Types. A TokenType X with categories C1, C2, ... ,Cn can be matched by the parser against any of those categories. In practical terms this means that: CONSUME(C1) can match a Token of type X.