The grammar section is heavily based on the EBNF notation with some additional syntax to help with generating output. If you aren't familiar with EBNF it would be good to start there before continuing.
Rules
The basic syntax of a Rule is a Rule Name enclosed by []
brackets and then a set of symbols (combinations of Terminals and Non Terminals). Multiple Production Rules can be declared for a Rule Name by seperating them with a |
. Depending on the parsing algorithm you choose there maybe additional constraints on the Production Rules.
Consider this example:
[Start]
| HelloGoodbye __ Target
[HelloGoodbye]
| "Hello"
| "Goodbye"
[Target]
| "World"
| r:{[a-zA-Z]+}
[__]
<ws>
There are 4 Non Terminals; Start
, HelloGoodbye
, Target
, and __
.
5 Terminals; "Hello"
, "Goodbye"
, "World"
, r:{a-zA-Z+}
, and <ws>
_.
You will also notice that we use an optional leading pipe |
for the [Start]
rule but omit it in the [__]
rule. This is not a typo this is showcasing styling differences.
Rule Name
The name of a rule must be a wordLoading...
Symbols
Non Terminals
Non Terminals, represented by a word are references to another production rule in the grammar.
Terminals
A Terminal is a literal, regex expression or token found on the right hand side of a production rule. Terminals are what lexer tokens are evaluated against.
Literals
Literals are double quoted strings that are case-sensitively matched in the lexer stream.
"Hello"
Optionaly strings can be modified to allow case insensitive matching by prepending the string with i:
.
i:"Hello"
Regular Expressions
Grammar Well supports Regular Expressions but it is written in TypeScript, so it is limited to the capabilities of JavaScript's regex.
r:{[a-zA-Z]+}
Token Tags
Token Tags are set in the lexer and are matched in the grammar by wrapping the token tag in angled brackets.
<token>
Sub-Expressions
Symbols can be grouped with ()
seperating each option with a |
to create an inline sub-expression
The following are equivalent:
[RuleName]
("Hello" | "Goodbye") "World"
[RuleName]
SubRule "World"
[SubRule]
| "Hello"
| "Goodbye"
Quantifiers
A Quantifier refers to ?
,*
, and +
which can be appened to symbols. These characters match the behavior commonly found in Regular Expressions.
Symbol | Quantity | Example |
---|---|---|
? | 0 or 1 | "a"? |
* | 0 or more | "b"* |
+ | 1 or more | "c"+ |
Post Processors
In addition to the standard anatomy of EBNF. Grammar Well supports Post Processors. Post Processors are used to either evaluate or transform a matched production rule's values.
Types
Type | Example | Description |
---|---|---|
Array | => [ ...$0, $3.value ] | JavaScript Array syntax |
Expression | => ( JSON.parse($0.value) ) | JavaScript Expressions wrapped in paranthesis |
Function Body | => { return JSON.parse($0.value) } | JavaScript Function body syntax |
Interpolation | => ${ ({data}) => JSON.parse(data[0].value) } | Interpolates content into parser. This is expected to be invokable |
Positioning
They can immediately follow a rule name to apply to each rule as the default but overridable postprocessor.
[RuleName] => ($0.value)
| "World"
| "Goodbye"
[RuleName]
| "World" => ($0.value)
| "Goodbye" => ($0.value)
Ordinal References
The Javascript Template version expects a function body and is provided a variable data
. It will also do simple string replacements. For example any $
followed by a number will be replaced with data[number]
.
[RuleName]
"Hello" => ( $0.value )
[RuleName]
"Hello" => ${ ({data}) => data[0].value }
Aliased References
Keeping tracking of the ordinal index of your symbols in an expression can be tedious, so Grammar Well also provides aliasing. Any symbol in an expression can be suffixed with @word
. That wordLoading... can then be referenced in the template.
[RuleName]
"Hello"@hello => ( $hello.value )
[RuleName]
"Hello" => ${ ({data}) => data[0].value }