Lexer | chevrotain

Hierarchy

Lexer

Index

Constructors

constructor

Properties

Methods

tokenize

Constructors

constructor

new Lexer(lexerDefinition: SingleModeLexerDefinition | IMultiModeLexerDefinition, config?: ILexerConfig): Lexer

- Defined in chevrotain.d.ts:266
Parameters
- lexerDefinition: SingleModeLexerDefinition | IMultiModeLexerDefinition
  - Structure composed of constructor functions for the Tokens types this lexer will support.
  
  In the case of {SingleModeLexerDefinition} the structure is simply an array of TokenTypes. In the case of {IMultiModeLexerDefinition} the structure is an object with two properties:
  1. a "modes" property where each value is an array of TokenTypes.
  2. a "defaultMode" property specifying the initial lexer mode.
    
    for example: { "modes" : { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }
    
    "defaultMode" : "modeY" }
    
    A lexer with {MultiModesDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.
    
    The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.
    
    Each Token Type can define that it will cause the Lexer to (after consuming an "instance" of the Token):
    
    PUSH_MODE : push a new mode to the "mode stack"
    
    POP_MODE : pop the last mode from the "mode stack"
    
    Examples: export class Attribute { static PATTERN = ... static PUSH_MODE = "modeY" }
    
    export class EndAttribute { static PATTERN = ... static POP_MODE = true }
    
    The TokenTypes must be in one of these forms:
    
    With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer { static PATTERN = /[1-9]\d }<--
    
    With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword { static PATTERN = NA }<--
  The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag
  
  The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.
  
  Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)
  - If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.
  - If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.
    
    To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:
    
    export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword Token {
    
    static PATTERN = Lexer.NA static LONGER_ALT = Identifier
    } export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }
    
    The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.
- Optional config: ILexerConfig
Returns Lexer

Properties

Protected defaultMode

defaultMode: string

Protected emptyGroups

emptyGroups: object

Type declaration

[groupName: string]: IToken

Protected lexerDefinition

lexerDefinition: SingleModeLexerDefinition | IMultiModeLexerDefinition

lexerDefinitionErrors

lexerDefinitionErrors: ILexerDefinitionError[]

Protected modes

modes: string[]

Protected patternIdxToConfig

patternIdxToConfig: any

Static NA

NA: RegExp

Static SKIPPED

SKIPPED: string

Methods

tokenize

tokenize(text: string, initialMode?: string): ILexingResult

- Defined in chevrotain.d.ts:365
Will lex(Tokenize) a string. Note that this can be called repeatedly on different strings as this method does not modify the state of the Lexer.

Parameters
- text: string
  
  The string to lex
- Optional initialMode: string
Returns ILexingResult

Hierarchy

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

lexerDefinition: SingleModeLexerDefinition | IMultiModeLexerDefinition

Optional config: ILexerConfig

Returns Lexer

Properties

Protected defaultMode

Protected emptyGroups

Type declaration

[groupName: string]: IToken

Protected lexerDefinition

lexerDefinitionErrors

Protected modes

Protected patternIdxToConfig

Static NA

Static SKIPPED

Methods

tokenize

Parameters

text: string

Optional initialMode: string

Returns ILexingResult