Chevrotain
Home
Features
Tutorial
Guide
FAQ
Changes
APIs
Playground
Benchmark
Discussions
GitHub
Home
Features
Tutorial
Guide
FAQ
Changes
APIs
Playground
Benchmark
Discussions
GitHub
  • Features

    • Blazing Fast
    • LL(K) Grammars
    • Separation of Grammar and Semantics
    • Easy Debugging
    • Fault Tolerance
    • Multiple Start Rules
    • Customizable Error Messages
    • Parameterized Rules
    • Gates
    • Syntactic Content Assist
    • Grammar Inheritance
    • Backtracking
    • Syntax Diagrams
    • RegExp Based Lexers
    • Position Tracking
    • Token Alternative Matches
    • Token Skipping
    • Token Categories
    • Token Grouping
    • Custom Token Patterns
    • Lexer Modes

Custom Token Patterns

Chevrotain is not limited to only using JavaScript regular expressions to define Tokens. Tokens can also be defined using arbitrary JavaScript code, for example:

// our custom matcher
function matchInteger(text, startOffset) {
  let endOffset = startOffset;
  let charCode = text.charCodeAt(endOffset);
  // 0-9 digits
  while (charCode >= 48 && charCode <= 57) {
    endOffset++;
    charCode = text.charCodeAt(endOffset);
  }

  // No match, must return null to conform with the RegExp.prototype.exec signature
  if (endOffset === startOffset) {
    return null;
  } else {
    let matchedString = text.substring(startOffset, endOffset);
    // according to the RegExp.prototype.exec API the first item in the returned array must be the whole matched string.
    return [matchedString];
  }
}

const IntegerToken = createToken({
  name: "IntegerToken",
  pattern: matchInteger,
});

This feature is often used to implement complex lexing logic, such as python indentation.

See in depth guide for further details.

Edit this page on GitHub
Last Updated: 7/9/23, 12:55 AM
Contributors: Shahar Soel, bd82
Prev
Token Grouping
Next
Lexer Modes