Lexer | Chevrotain

Hierarchy

Lexer

Index

Constructors

constructor

Properties

Methods

tokenize

Object literals

SKIPPED

Constructors

constructor

new Lexer(lexerDefinition: SingleModeLexerDefinition | MultiModeLexerWDefinition, deferDefinitionErrorsHandling?: boolean): Lexer

- Defined in scan/lexer_public.ts:57
Parameters
- lexerDefinition: SingleModeLexerDefinition | MultiModeLexerWDefinition
  - Structure composed of constructor functions for the Tokens types this lexer will support.
  
  In the case of {SingleModeLexerDefinition} the structure is simply an array of Token constructors. In the case of {MultiModeLexerWDefinition} the structure is an object where each value is an array of Token constructors.
  
  for example: { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }
  
  A lexer with {MultiModeLexerWDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.
  
  The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.
  
  Each Token class can define that it will cause the Lexer to (after consuming an instance of the Token)
  1. PUSH_MODE : push a new mode to the "mode stack"
  2. POP_MODE : pop the last mode from the "mode stack"
    
    Examples: export class Attribute extends Token {
    
    static PATTERN = ... static PUSH_MODE = "modeY"
    }
    
    export class EndAttribute extends Token {
    
    static PATTERN = ... static POP_MODE = true
    }
    
    The Token constructors must be in one of these forms:
  3. With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer extends Token { static PATTERN = /[1-9]\d }<--
  4. With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword extends Token { static PATTERN = NA }<--
  The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag
  
  The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.
  
  Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)
  - If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.
  - If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.
    
    To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:
    
    export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword extends Token {
    
    static PATTERN = lex.NA static LONGER_ALT = Identifier
    } export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }
    
    The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.
- Default value deferDefinitionErrorsHandling: boolean = false
Returns Lexer

Properties

Protected allPatterns

allPatterns: object

Type declaration

[modeName: string]: RegExp[]

Protected emptyGroups

emptyGroups: object

Type declaration

[groupName: string]: Token

Protected lexerDefinition

lexerDefinition: SingleModeLexerDefinition | MultiModeLexerWDefinition

- Structure composed of constructor functions for the Tokens types this lexer will support.

In the case of {SingleModeLexerDefinition} the structure is simply an array of Token constructors. In the case of {MultiModeLexerWDefinition} the structure is an object where each value is an array of Token constructors.

for example: { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }

A lexer with {MultiModeLexerWDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.

The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.

Each Token class can define that it will cause the Lexer to (after consuming an instance of the Token)

PUSH_MODE : push a new mode to the "mode stack"
POP_MODE : pop the last mode from the "mode stack"

Examples: export class Attribute extends Token { static PATTERN = ... static PUSH_MODE = "modeY" }

export class EndAttribute extends Token { static PATTERN = ... static POP_MODE = true }

The Token constructors must be in one of these forms:

With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer extends Token { static PATTERN = /[1-9]\d }<--
With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword extends Token { static PATTERN = NA }<--

The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag

The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.

Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)

If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.

If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.

To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:

export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword extends Token { static PATTERN = lex.NA static LONGER_ALT = Identifier } export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }

The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.

lexerDefinitionErrors

lexerDefinitionErrors: Array<any>

Protected modes

modes: string[]

Protected patternIdxToCanLineTerminator

patternIdxToCanLineTerminator: object

Type declaration

[modeName: string]: boolean[]

Protected patternIdxToClass

patternIdxToClass: object

Type declaration

[modeName: string]: Function[]

Protected patternIdxToGroup

patternIdxToGroup: object

Type declaration

[modeName: string]: string[]

Protected patternIdxToLongerAltIdx

patternIdxToLongerAltIdx: object

Type declaration

[modeName: string]: number[]

Protected patternIdxToPopMode

patternIdxToPopMode: object

Type declaration

[modeName: string]: boolean[]

Protected patternIdxToPushMode

patternIdxToPushMode: object

Type declaration

[modeName: string]: string[]

Static NA

NA: RegExp

Methods

tokenize

tokenize(text: string, initialMode?: string): ILexingResult

- Defined in scan/lexer_public.ts:196
Will lex(Tokenize) a string. Note that this can be called repeatedly on different strings as this method does not modify the state of the Lexer.

Parameters
- text: string
  
  the string to lex
- Default value initialMode: string = first(this.modes)
Returns ILexingResult

}}

Hierarchy

Index

Constructors

Properties

Methods

Object literals

Constructors

constructor

Parameters

lexerDefinition: SingleModeLexerDefinition | MultiModeLexerWDefinition

Default value deferDefinitionErrorsHandling: boolean = false

Returns Lexer

Properties

Protected allPatterns

Type declaration

[modeName: string]: RegExp[]

Protected emptyGroups

Type declaration

[groupName: string]: Token

Protected lexerDefinition

lexerDefinitionErrors

Protected modes

Protected patternIdxToCanLineTerminator

Type declaration

[modeName: string]: boolean[]

Protected patternIdxToClass

Type declaration

[modeName: string]: Function[]

Protected patternIdxToGroup

Type declaration

[modeName: string]: string[]

Protected patternIdxToLongerAltIdx

Type declaration

[modeName: string]: number[]

Protected patternIdxToPopMode

Type declaration

[modeName: string]: boolean[]

Protected patternIdxToPushMode

Type declaration

[modeName: string]: string[]

Static NA

Methods

tokenize

Parameters

text: string

Default value initialMode: string = first(this.modes)

Returns ILexingResult

Object literals

Static SKIPPED

description