Grammar

The grammar section is heavily based on the EBNF notation with some additional syntax to help with generating output. If you aren't familiar with EBNF, it would be good to start there before continuing.

Rules

The basic syntax of a Rule is a Rule Name enclosed by [] brackets and then a set of symbols (combinations of Terminals and Non Terminals). Multiple Production Rules can be declared for a Rule Name by separating them with a |. Depending on the parsing algorithm you choose, there may be additional constraints on the Production Rules.
Consider this example:

[Start] 
    |  HelloGoodbye __ Target

[HelloGoodbye]
    | "Hello"
    | "Goodbye"

[Target] 
    | "World" 
    | r:{[a-zA-Z]+}

[__] 
    <ws>

There are 4 Non Terminals: Start, HelloGoodbye, Target, and __.

There are 5 Terminals: "Hello", "Goodbye", "World", r:{a-zA-Z+}, and <ws>.

You will also notice that we use an optional leading pipe | for the [Start] rule but omit it in the [__] rule. This is not a typo this is showcasing styling differences.

Rule Name

The name of a rule must be a wordLoading....

Symbols

Non Terminals

Non Terminals are references to other production rules in the grammar, represented by a word (an unquoted identifier).

Example:

[Expression]
        | Term "+" Expression
        | Term

    [Term]
        | "number"

Terminals

A Terminal is a literal, regular expression, or token found on the right-hand side of a production rule. Terminals are matched directly by the lexer and represent the basic symbols from which the language is constructed.

Examples:

Literal: "Hello"
Regex: r:{a-z+}
Token: <ws>

Literals

Literals are always double-quoted strings that are case-sensitively matched in the lexer stream. Whitespace and escape sequences inside the quotes are preserved.

"Hello"

Optionally, strings can be modified to allow case-insensitive matching by prepending the string with i::

i:"Hello"

Regular Expressions

Grammar Well supports Regular Expressions for matching patterns in the lexer. Regular expressions must be written using JavaScript's regex syntax, as Grammar Well is implemented in TypeScript and limited to JavaScript's regex capabilities.

Use the syntax r:{...} to define a regular expression:

r:{[a-zA-Z]+}

Note: Regex flags (such as i for case-insensitive) are not currently supported.
Some advanced features (like lookbehind) may not be available depending on your JavaScript engine.

Token Tags

Token Tags are set in the lexer and are matched in the grammar by wrapping the token tag in angled brackets.

Example:

<token>

Token tags allow you to reference lexer-defined tokens in your grammar rules by name.

Sub-Expressions

Symbols can be grouped with () separating each option with a | to create an inline sub-expression.

The following are equivalent:

[RuleName]
    ("Hello" | "Goodbye") "World"

[RuleName]
    SubRule "World"
[SubRule]
    | "Hello" 
    | "Goodbye"

Quantifiers

A Quantifier refers to ?, *, and +, which can be appended to symbols to modify their quantity. These characters match the behavior commonly found in Regular Expressions and apply to the immediately preceding symbol.

Symbol	Quantity	Example
`?`	0 or 1	`"a"?`
`*`	0 or more	`"b"*`
`+`	1 or more	`"c"+`

For complex expressions, use parentheses to group symbols before applying a quantifier.

Post Processors

In addition to the standard anatomy of EBNF, Grammar Well supports Post Processors. Post Processors are used to either evaluate or transform a matched production rule's values.

Types

Type	Example	Description
Array	`=> [ ...$0, $3.value ]`	JavaScript Array syntax
Expression	`=> ( JSON.parse($0.value) )`	JavaScript Expressions wrapped in parenthesis
Function Body	`=> { return JSON.parse($0.value) }`	JavaScript Function body syntax
Interpolation	`=> ${ ({data}) => JSON.parse(data[0].value) }`	Interpolates content into parser. This is expected to be invokable

Positioning

They can immediately follow a rule name to apply to each rule as the default but overridable postprocessor.

[RuleName] => ($0.value)  
    | "World"
    | "Goodbye"

[RuleName]
    | "World" => ($0.value)
    | "Goodbye" => ($0.value)

Ordinal References

The JavaScript Template version expects a function body and is provided a variable data. It will also do simple string replacements. For example, any $ followed by a number will be replaced with data[number].

[RuleName]
    "Hello" => ( $0.value )

[RuleName] 
    "Hello" => ${ ({data}) => data[0].value }

Aliased References

Keeping track of the ordinal index of your symbols in an expression can be tedious, so Grammar Well also provides aliasing. Any symbol in an expression can be suffixed with @word. That wordLoading... can then be referenced in the template.

[RuleName]
    "Hello"@hello => ( $hello.value )

[RuleName]
    "Hello" => ${ ({data}) => data[0].value }

Grammar Well

Table of Contents