Interface ITokenConfig


  • ITokenConfig


categories?: TokenType | TokenType[]

Categories enable polymorphism on Token Types. A TokenType X with categories C1, C2, ... ,Cn can be matched by the parser against any of those categories. In practical terms this means that: CONSUME(C1) can match a Token of type X.

group?: string

The group property will cause the lexer to collect Tokens of this type separately from the other Tokens.

For example this could be used to collect comments for post processing.


label?: string

The Label is a human readable name to be used in error messages and syntax diagrams.

For example a TokenType may be called LCurly, which is short for "left curly brace". The much easier to understand label could simply be "{".

line_breaks?: boolean

Can a String matching this Token Type's pattern possibly contain a line terminator? If true and the line_breaks property is not also true this will cause inaccuracies in the Lexer's line / column tracking.

longer_alt?: TokenType | TokenType[]

The "longer_alt" property will cause the Lexer to attempt matching against other Token Types every time this Token Type has been matched.

This feature can be useful when two or more Token Types have common prefixes which cannot be resolved (only) by the ordering of the Tokens in the lexer definition.

  • Note that the longer_alt capability cannot be chained.
  • Note that the first matched longer_alt takes precendence.

For example see: For resolving the keywords vs Identifier ambiguity.

name: string
pattern?: TokenPattern

This defines what sequence of characters would be matched To this TokenType when Lexing.

For Custom Patterns see:

pop_mode?: boolean

If "pop_mode" is true the Lexer will pop the last mode of the modes stack and continue lexing using the new mode at the top of the stack.


push_mode?: string

A name of a Lexer mode to "enter" once this Token Type has been matched. Lexer modes can be used to support different sets of possible Tokens Types

Lexer Modes work as a stack of Lexers, so "entering" a mode means pushing it to the top of the stack.


start_chars_hint?: (string | number)[]

Possible starting characters or charCodes of the pattern. These will be used to optimize the Lexer's performance.

These are normally automatically computed, however the option to explicitly specify those can enable optimizations even when the automatic analysis fails.


  • strings hints should be one character long.
  { start_chars_hint: ["a", "b"] }
  • number hints are the result of running ".charCodeAt(0)" on the strings.
  { start_chars_hint: [97, 98] }
  • For unicode characters outside the BMP use the first of their surrogate pairs. for example: The '💩' character is represented by surrogate pairs: '\uD83D\uDCA9' and D83D is 55357 in decimal.
  • Note that "💩".charCodeAt(0) === 55357

Generated using TypeDoc