Grammar Well utilizes a stateful lexer, which is optional but highly recommended due to its significant assistance in constructing production rules for the grammar. The lexer configuration comprises two subsections: config and states. The configuration section must be placed at the top.
Currently, the only configuration option available is the optional setting start
, which determines the initial lexer state.
A lexer state is a named collection of rules that define how to tokenize input when in that state.
lexer {
[string]
- import singleQuoteString, doubleQuoteString
[singleQuoteString]
- when r:{'} tag "squote" highlight "string" goto singleQuoteStringEnd
[singleQuoteStringEnd]
- when r:{\\[\\\/bnrft]} tag "escaped"
- when r:{\\'} tag "quoteEscape"
- when r:{\\u[A-Fa-f\d]{4}} tag "escaped"
- when r:{\\.} tag "badEscape"
- when r:{[^'\\]+} tag "string" highlight "string"
- when "'" tag "squote" highlight "string" pop
[doubleQuoteString] span {
[start]
- when "\"" tag "dquote" highlight "string"
[span]
- when r:{\\[\\\/bnrft]} tag "escaped" highlight "constant"
- when r:{\\"} tag "quoteEscape"
- when r:{\\u[A-Fa-f\d]{4}} tag "escaped" highlight "constant"
- when r:{\\.} tag "badEscape"
- when r:{[^"\\]+} tag "string" highlight "string"
[stop]
- when "\"" tag "dquote" highlight "string"
}
}
In the above example, we start with a state named string
followed by a -
delimited list of rules. There are two types of rules: import rules and matching rules.
Order is important: rules are evaluated from top to bottom.
Import
The import rule expects a comma-separated list of states whose rules are to be imported into this state. This is a convenient way of keeping your rules DRY.
- import singleQuoteString, doubleQuoteString
Match
Match rules, as the name implies, declare what to match in the input stream.
- when r:{[^"\\]+} tag "string" highlight "string"
- when "\"" tag "dquote" highlight "string" pop
Directives
Name | Arguments | Notes | Behavior |
---|---|---|---|
when | string | regex | Required. Exclusive with before , skip | What to match in the input stream |
before | string | regex | Required. Exclusive with when , skip | What to match but does not consume the input stream, should be used in conjunction with goto , pop , inset |
skip | string | regex | Required. Exclusive with when , before | What to match but does not get propagated to the grammar, the matched text is ignored |
goto | word | Must be a valid state Exclusive with pop , inset , stay | Moves to the defined state and adds the current state onto the stack |
pop | number | none | If included, implicitly uses 1 if no argument is provided Exclusive with goto , inset , stay | Pops 1 or the number of states off the stack |
inset | number | none | If included, implicitly uses 1 if no argument is provided Exclusive with goto , pop , stay | Adds the current state onto the stack 1 or the number of times defined. |
stay | none | Exclusive with goto , pop , inset | Prevents state switching when used in a span state |
tag | string(s) comma separated | ex: tag "tag1", "tag2", "tag3" | Applies 1 or more tags to the matched token; these can be referenced in the grammar |
highlight | string | This isn't used directly but can be used to help generate syntax highlighting. |
Spans
Spans are a lexer construct that define a lexer state for a specific language fragment that is enclosed by a start delimiter and an end delimiter, with its own set of tokens in the middle section.
For example, a string enclosed by quotation marks acts as a span, where the start and end quotes serve as delimiters, and the content inside may include special characters.
[doubleQuoteString] span {
[start]
- when "\"" tag "dquote" highlight "string"
[span]
- when r:{\\[\\\/bnrft]} tag "escaped" highlight "constant"
- when r:{\\"} tag "quoteEscape"
- when r:{\\u[A-Fa-f\d]{4}} tag "escaped" highlight "constant"
- when r:{\\.} tag "badEscape"
- when r:{[^"\\]+} tag "string" highlight "string"
[stop]
- when "\"" tag "dquote" highlight "string"
}