Semantics - CST Visitor
Run and Debug the source code.
Introduction
In the previous tutorial step we have implemented a parser for a "mini" SQL Select grammar. The current problem is that our parser only validates that the input conforms to the grammar -- in other words, it is just a recognizer. But in most real-world use cases the parser will also have to output some result/data structure/value.
This can be accomplished using a CST (Concrete Syntax Tree) Visitor defined outside our grammar:
- See the in-depth documentation of Chevrotain's CST capabilities
Enabling CST
This feature is automatically enabled when a Parser extends the Chevrotain CstParser class.
The invocation of any grammar rule will now automatically create a CST.
function parseInput(text) {
const lexingResult = SelectLexer.tokenize(text);
const parser = new SelectParser(lexingResult.tokens);
// CST automatically created.
const cstOutput = parser.selectStatement();
}
The CST Visitor
Creating a CST is not enough; we also need to traverse this structure and execute our actions (semantics).
Each Chevrotain parser instance exposes two BaseVisitor classes which can be extended to create custom user visitors.
// BaseVisitor constructors are accessed via a parser instance.
const parserInstance = new SelectParser();
const BaseSQLVisitor = parserInstance.getBaseCstVisitorConstructor();
// This BaseVisitor includes default visit methods that simply traverse the CST.
const BaseSQLVisitorWithDefaults =
parserInstance.getBaseCstVisitorConstructorWithDefaults();
class myCustomVisitor extends BaseSQLVisitor {
constructor() {
super();
// The "validateVisitor" method is a helper utility which performs static analysis
// to detect missing or redundant visitor methods
this.validateVisitor();
}
/* Visit methods go here */
}
class myCustomVisitorWithDefaults extends BaseSQLVisitorWithDefaults {
constructor() {
super();
this.validateVisitor();
}
/* Visit methods go here */
}
const myVisitorInstance = new myCustomVisitor();
const myVisitorInstanceWithDefaults = new myCustomVisitorWithDefaults();
In our example we will use the BaseVisitor constructor (without defaults).
Visitor Methods
So we now know how to create a CST visitor. But how do we actually make it perform the actions (semantics) we wish? For that we must create a visit method for each grammar rule.
Recall the selectClause grammar from the previous step:
selectClause:
"SELECT" Identifier ("," Identifier)*;
Let's create a visitor method for the selectClause rule.
class SQLToAstVisitor extends BaseSQLVisitor {
constructor() {
super();
this.validateVisitor();
}
// The Ctx argument is the current CSTNode's children.
selectClause(ctx) {
// Each Terminal or Non-Terminal in a grammar rule is collected into
// an array with the same name (key) in the ctx object.
let columns = ctx.Identifier.map((identToken) => identToken.image);
return {
type: "SELECT_CLAUSE",
columns: columns,
};
}
}
So far pretty simple. Now let's add another visit method for "selectStatement". First, let's recall its grammar.
selectStatement
: selectClause fromClause (whereClause)?
And now to the code:
class SQLToAstVisitor extends BaseSQLVisitor {
constructor() {
super();
this.validateVisitor();
}
// The Ctx argument is the current CSTNode's children.
selectClause(ctx) {
/* as above... */
}
selectStatement(ctx) {
// "this.visit" can be used to visit non-terminals and will invoke the correct visit method for the CstNode passed.
let select = this.visit(ctx.selectClause);
// "this.visit" can work on either a CstNode or an Array of CstNodes.
// If an array is passed (ctx.fromClause is an array) it is equivalent
// to passing the first element of that array
let from = this.visit(ctx.fromClause);
// "whereClause" is optional, "this.visit" will ignore empty arrays (optional)
let where = this.visit(ctx.whereClause);
return {
type: "SELECT_STMT",
selectClause: select,
fromClause: from,
whereClause: where,
};
}
}
Full Visitor
We still have a few grammar rules we need to build visitors for.
fromClause
: "FROM" Identifier
whereClause
: "WHERE" expression
expression
: atomicExpression relationalOperator atomicExpression
atomicExpression
: Integer | Identifier
relationalOperator
: ">" | "<"
Let's implement those as well.
class SQLToAstVisitor extends BaseSQLVisitor {
constructor() {
super();
this.validateVisitor();
}
selectStatement(ctx) {
/* as above... */
}
selectClause(ctx) {
/* as above... */
}
fromClause(ctx) {
const tableName = ctx.Identifier[0].image;
return {
type: "FROM_CLAUSE",
table: tableName,
};
}
whereClause(ctx) {
const condition = this.visit(ctx.expression);
return {
type: "WHERE_CLAUSE",
condition: condition,
};
}
expression(ctx) {
// Note the usage of the "rhs" and "lhs" labels defined in step 2 in the expression rule.
const lhs = this.visit(ctx.lhs[0]);
const operator = this.visit(ctx.relationalOperator);
const rhs = this.visit(ctx.rhs[0]);
return {
type: "EXPRESSION",
lhs: lhs,
operator: operator,
rhs: rhs,
};
}
// these two visitor methods will return a string.
atomicExpression(ctx) {
if (ctx.Integer) {
return ctx.Integer[0].image;
} else {
return ctx.Identifier[0].image;
}
}
relationalOperator(ctx) {
if (ctx.GreaterThan) {
return ctx.GreaterThan[0].image;
} else {
return ctx.LessThan[0].image;
}
}
}
Usage
So we know how to create a CST Visitor, but how do we actually use it?
// A new parser instance with CST output enabled.
const parserInstance = new SelectParser([], { outputCst: true });
// Our visitor has no state, so a single instance is sufficient.
const toAstVisitorInstance = new SQLToAstVisitor();
function toAst(inputText) {
// Lex
const lexResult = selectLexer.tokenize(inputText);
parserInstance.input = lexResult.tokens;
// Automatic CST created when parsing
const cst = parserInstance.selectStatement();
if (parserInstance.errors.length > 0) {
throw Error(
"Sad sad panda, parsing errors detected!\n" +
parserInstance.errors[0].message,
);
}
// Visit
const ast = toAstVisitorInstance.visit(cst);
return ast;
}