Grammar Well utilizes a stateful lexer, which is optional but highly recommended due to its significant assistance in constructing production rules for the grammar. The lexer configuration comprises two subsections: config and states. It is important to note that the configuration section must be placed at the top. Currently, the sole configuration option available is the optional setting start, which determines the initial lexer state to begin with.
lexer {
[string]
- import singleQuoteString, doubleQuoteString
[singleQuoteString]
- when r:{'} tag "squote" highlight "string" goto singleQuoteStringEnd
[singleQuoteStringEnd]
- when r:{\\[\\\/bnrft]} tag "escaped"
- when r:{\\'} tag "quoteEscape"
- when r:{\\u[A-Fa-f\d]{4}} tag "escaped"
- when r:{\\.} tag "badEscape"
- when r:{[^'\\]+} tag "string" highlight "string"
- when "'" tag "squote" highlight "string" pop
[doubleQuoteString] span {
[start]
- when "\"" tag "dquote" highlight "string"
[span]
- when r:{\\[\\\/bnrft]} tag "escaped" highlight "constant"
- when r:{\\"} tag "quoteEscape"
- when r:{\\u[A-Fa-f\d]{4}} tag "escaped" highlight "constant"
- when r:{\\.} tag "badEscape"
- when r:{[^"\\]+} tag "string" highlight "string"
[stop]
- when "\"" tag "dquote" highlight "string"
}
}
In the above example, we start with a state named string
followed by -
delimited list of rules. There are two type of rules import rules and matching rules. Order is important.
Import
The import rule expects a comma delimited list of states whose rules are to be imported in to this state. This is a convenient way of keeping your rules DRY.
- import singleQuoteString, doubleQuoteString
Match
Match rules, as the name implies, declare what to match in the input stream.
- when r:{[^"\\]+} tag "string" highlight "string"
- when "\"" tag "dquote" highlight "string" pop
Directives
Name | Arguments | Notes | Behavior |
---|---|---|---|
when | string | regex | Required. Exclusive with before , skip | What to match in the input stream |
before | string | regex | Required. Exclusive with when , skip | What to match but does not consume the input stream, should be used in conjunction with goto , pop , inset |
skip | string | regex | Required. Exclusive with when , before | What to match but does not get propogated to the grammar, the matched text is ignored |
goto | word | Must be a valid state Exclusive with pop , inset , stay | Moves to the defined state and adds the current state onto the stack |
pop | number | none | If included implicitly uses 1 if no argument is provided Exclusive with goto , inset , stay | Pops 1 or the number of states off the stack |
inset | number | none | If included implicitly uses 1 if no argument is provided Exclusive with goto , pop , stay | Adds the current state onto the stack 1 or the number of times defined. |
stay | none | Exclusive with goto , pop , inset | Prevents state switching when used in a span state |
tag | string(s) comma seperated | ex: tag "tag1", "tag2", "tag3" | Applies 1 or more tags to the matched token, these can be referenced in the grammar |
highlight | string | This isn't used directly but can be used to help generate syntax highlighting. |
Spans
Spans are a lexer construct that define a lexer state for a specific language fragment that is enclosed by a start delimiter and an end delimiter, with its own set of tokens in the middle section.
For example, a string enclosed by quotation marks acts as a span, where the start and end quotes serve as delimiters, and the content inside may include special characters.
[doubleQuoteString] span {
[start]
- when "\"" tag "dquote" highlight "string"
[span]
- when r:{\\[\\\/bnrft]} tag "escaped" highlight "constant"
- when r:{\\"} tag "quoteEscape"
- when r:{\\u[A-Fa-f\d]{4}} tag "escaped" highlight "constant"
- when r:{\\.} tag "badEscape"
- when r:{[^"\\]+} tag "string" highlight "string"
[stop]
- when "\"" tag "dquote" highlight "string"
}