- Structure composed of constructor functions for the Tokens types this lexer will support.
In the case of {SingleModeLexerDefinition} the structure is simply an array of Token constructors. In the case of {MultiModeLexerWDefinition} the structure is an object where each value is an array of Token constructors.
for example: { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }
A lexer with {MultiModeLexerWDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.
The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.
Each Token class can define that it will cause the Lexer to (after consuming an instance of the Token)
POP_MODE : pop the last mode from the "mode stack"
Examples: export class Attribute extends Token {
static PATTERN = ...
static PUSH_MODE = "modeY"
}
export class EndAttribute extends Token {
static PATTERN = ...
static POP_MODE = true
}
The Token constructors must be in one of these forms:
With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer extends Token { static PATTERN = /[1-9]\d }<--
With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword extends Token { static PATTERN = NA }<--
The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag
The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.
Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)
If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.
If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.
To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:
export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword extends Token {
static PATTERN = lex.NA
static LONGER_ALT = Identifier
} export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }
The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.
- Structure composed of constructor functions for the Tokens types this lexer will support.
In the case of {SingleModeLexerDefinition} the structure is simply an array of Token constructors. In the case of {MultiModeLexerWDefinition} the structure is an object where each value is an array of Token constructors.
for example: { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }
A lexer with {MultiModeLexerWDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.
The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.
Each Token class can define that it will cause the Lexer to (after consuming an instance of the Token)
Examples: export class Attribute extends Token { static PATTERN = ... static PUSH_MODE = "modeY" }
export class EndAttribute extends Token { static PATTERN = ... static POP_MODE = true }
The Token constructors must be in one of these forms:
With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer extends Token { static PATTERN = /[1-9]\d }<--
With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword extends Token { static PATTERN = NA }<--
The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag
The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.
Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)
If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.
If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.
To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:
export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword extends Token { static PATTERN = lex.NA static LONGER_ALT = Identifier } export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }
The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.
Will lex(Tokenize) a string. Note that this can be called repeatedly on different strings as this method does not modify the state of the Lexer.
the string to lex
}}
Generated using TypeDoc