Options
All
  • Public
  • Public/Protected
  • All
Menu

Class Lexer

Hierarchy

  • Lexer

Index

Constructors

constructor

  • Parameters

    • lexerDefinition: SingleModeLexerDefinition | MultiModeLexerWDefinition

      - Structure composed of constructor functions for the Tokens types this lexer will support.

      In the case of {SingleModeLexerDefinition} the structure is simply an array of Token constructors. In the case of {MultiModeLexerWDefinition} the structure is an object where each value is an array of Token constructors.

      for example: { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }

      A lexer with {MultiModeLexerWDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.

      The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.

      Each Token class can define that it will cause the Lexer to (after consuming an instance of the Token)

      1. PUSH_MODE : push a new mode to the "mode stack"
      2. POP_MODE : pop the last mode from the "mode stack"

        Examples: export class Attribute extends Token {

         static PATTERN = ...
         static PUSH_MODE = "modeY"
        

        }

        export class EndAttribute extends Token {

         static PATTERN = ...
         static POP_MODE = true
        

        }

        The Token constructors must be in one of these forms:

      3. With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer extends Token { static PATTERN = /[1-9]\d }<--

      4. With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword extends Token { static PATTERN = NA }<--

      The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag

      The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.

      Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)

      • If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.

      • If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.

        To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:

        export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword extends Token {

         static PATTERN = lex.NA
         static LONGER_ALT = Identifier
        

        } export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }

        The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.

    • Default value deferDefinitionErrorsHandling: boolean = false

    Returns Lexer

Properties

Protected allPatterns

allPatterns: object

Type declaration

  • [modeName: string]: RegExp[]

Protected emptyGroups

emptyGroups: object

Type declaration

  • [groupName: string]: Token

Protected lexerDefinition

- Structure composed of constructor functions for the Tokens types this lexer will support.

In the case of {SingleModeLexerDefinition} the structure is simply an array of Token constructors. In the case of {MultiModeLexerWDefinition} the structure is an object where each value is an array of Token constructors.

for example: { "modeX" : [Token1, Token2] "modeY" : [Token3, Token4] }

A lexer with {MultiModeLexerWDefinition} is simply multiple Lexers where only one (mode) can be active at the same time. This is useful for lexing languages where there are different lexing rules depending on context.

The current lexing mode is selected via a "mode stack". The last (peek) value in the stack will be the current mode of the lexer.

Each Token class can define that it will cause the Lexer to (after consuming an instance of the Token)

  1. PUSH_MODE : push a new mode to the "mode stack"
  2. POP_MODE : pop the last mode from the "mode stack"

Examples: export class Attribute extends Token { static PATTERN = ... static PUSH_MODE = "modeY" }

export class EndAttribute extends Token { static PATTERN = ... static POP_MODE = true }

The Token constructors must be in one of these forms:

  1. With a PATTERN property that has a RegExp value for tokens to match: example: -->class Integer extends Token { static PATTERN = /[1-9]\d }<--

  2. With a PATTERN property that has the value of the var Lexer.NA defined above. This is a convenience form used to avoid matching Token classes that only act as categories. example: -->class Keyword extends Token { static PATTERN = NA }<--

The following RegExp patterns are not supported: a. '$' for match at end of input b. /b global flag c. /m multi-line flag

The Lexer will identify the first pattern that matches, Therefor the order of Token Constructors may be significant. For example when one pattern may match a prefix of another pattern.

Note that there are situations in which we may wish to order the longer pattern after the shorter one. For example: keywords vs Identifiers. 'do'(/do/) and 'donald'(/w+)

If the Identifier pattern appears before the 'do' pattern, both 'do' and 'donald' will be lexed as an Identifier.

If the 'do' pattern appears before the Identifier pattern 'do' will be lexed correctly as a keyword. however 'donald' will be lexed as TWO separate tokens: keyword 'do' and identifier 'nald'.

To resolve this problem, add a static property on the keyword's constructor named: LONGER_ALT example:

export class Identifier extends Keyword { static PATTERN = /[_a-zA-Z][_a-zA-Z0-9]/ } export class Keyword extends Token { static PATTERN = lex.NA static LONGER_ALT = Identifier } export class Do extends Keyword { static PATTERN = /do/ } export class While extends Keyword { static PATTERN = /while/ } export class Return extends Keyword { static PATTERN = /return/ }

The lexer will then also attempt to match a (longer) Identifier each time a keyword is matched.

lexerDefinitionErrors

lexerDefinitionErrors: Array<any>

Protected modes

modes: string[]

Protected patternIdxToCanLineTerminator

patternIdxToCanLineTerminator: object

Type declaration

  • [modeName: string]: boolean[]

Protected patternIdxToClass

patternIdxToClass: object

Type declaration

  • [modeName: string]: Function[]

Protected patternIdxToGroup

patternIdxToGroup: object

Type declaration

  • [modeName: string]: string[]

Protected patternIdxToLongerAltIdx

patternIdxToLongerAltIdx: object

Type declaration

  • [modeName: string]: number[]

Protected patternIdxToPopMode

patternIdxToPopMode: object

Type declaration

  • [modeName: string]: boolean[]

Protected patternIdxToPushMode

patternIdxToPushMode: object

Type declaration

  • [modeName: string]: string[]

Static NA

NA: RegExp

Methods

tokenize

  • Will lex(Tokenize) a string. Note that this can be called repeatedly on different strings as this method does not modify the state of the Lexer.

    Parameters

    • text: string

      the string to lex

    • Default value initialMode: string = first(this.modes)

    Returns ILexingResult

    }}

Object literals

Static SKIPPED

SKIPPED: object

description

description: string

Generated using TypeDoc