Grammar Syntax Reference¶
This page is a complete reference for Grammar-Kit's BNF grammar syntax. For a tutorial-style introduction, see Grammar Syntax.
File Structure¶
A .bnf file consists of an optional attribute header block followed by rule definitions:
{
// attribute header block
parserClass="com.example.MyParser"
}
// rule definitions
root ::= item *
item ::= id '=' value
The formal structure:
grammar ::= grammar_element *
grammar_element ::= attrs | rule
Rules¶
A rule associates a name with an expression. It may have modifiers, inline attributes, and an optional trailing semicolon:
rule ::= modifier* id '::=' expression attrs? ';'?
Rule modifiers¶
| Modifier | Effect |
|---|---|
private | No AST node generated. Child nodes fold into the parent. |
external | No parsing code generated. The parse function is hand-written. |
meta | Parametrized rule that takes parse functions as arguments. |
left | Takes the previous sibling and becomes its parent (for left-associative operators). |
inner | Used with left. Takes the previous sibling and becomes its child. |
upper | Takes the parent node and replaces it. |
fake | Only PSI classes are generated. No parsing code is produced. |
Multiple modifiers can combine on a single rule:
private meta list_of ::= <<p>> (',' <<p>>) *
left inner assign_expr ::= '=' expr
Expressions¶
The right-hand side of a rule is an expression built from sequences, choices, and quantified terms:
expression ::= sequence ('|' sequence)*
sequence ::= option+
option ::= predicate | quantified | paren_expr | simple
Choices¶
The | operator separates alternatives. The parser tries each branch in order:
value ::= number | string | object | array
Sequences¶
Adjacent terms form a sequence that must match in order:
pair ::= key ':' value
Quantifiers¶
| Operator | Meaning | Example |
|---|---|---|
? | Zero or one (optional) | ';'? |
* | Zero or more | item * |
+ | One or more | item + |
item_list ::= item (',' item) *
optional_semi ::= ';'?
arguments ::= expr (',' expr) +
Predicates¶
Predicates test without consuming input. They are used for lookahead:
| Operator | Meaning | Example |
|---|---|---|
& | Positive lookahead (succeeds if the expression matches) | &'}' |
! | Negative lookahead (succeeds if the expression does not match) | !'}' |
private item_recover ::= !(")" | ",")
private items ::= [!")" item (',' item) *]
Grouping¶
Parentheses ( )¶
Group expressions with standard precedence:
list ::= '(' item (',' item) * ')'
Brackets [ ]¶
Brackets denote an optional group, equivalent to (...)?:
// These two forms are equivalent:
optional_items ::= [item (',' item) *]
optional_items ::= (item (',' item) *)?
Braces { } in expressions¶
Within a rule body, braces create an alternative grouping. At the top level, braces delimit attribute blocks.
Tokens and Literals¶
String literals¶
Quoted strings match literal text. Both single and double quotes work:
plus ::= '+'
keyword ::= "while"
Token references¶
Unquoted identifiers reference other rules or declared tokens:
expr ::= number PLUS number
Token declarations¶
The tokens attribute in the header block declares tokens with optional values:
{
tokens = [
id="regexp:\w+" // regexp token (regexp: prefix)
string // name only
PLUS_OP="+" // text-matched token
SWITCH="switch" // keyword token
]
}
Tokens have three categories:
- Regexp tokens use the
regexp:prefix and define a lexer pattern. Required for Live Preview. - Text-matched tokens have a quoted string value (e.g.,
PLUS_OP="+"). - Name-only tokens have no value and are matched by the lexer based on external configuration.
External Expressions¶
External expressions invoke methods not defined in the grammar. They are enclosed in << >>:
root ::= <<parseRoot item>>
meta comma_list ::= <<p>> (',' <<p>>) *
usage ::= <<comma_list expr>>
The first identifier inside << >> is the method name. Subsequent items are arguments passed to it. External expressions work with meta rules to implement parametrized parsing.
Attribute Blocks¶
Attribute blocks appear in curly braces. The header block at the top of the file sets global attributes. Inline attribute blocks on rules set rule-level attributes:
{
parserClass="com.example.MyParser"
extends(".*_expr")=expr
}
item ::= number {pin=1 recoverWhile=item_recover}
Attribute syntax:
attrs ::= '{' attr* '}'
attr ::= id attr_pattern? '=' attr_value ';'?
attr_pattern ::= '(' string ')'
attr_value ::= string | number | boolean | value_list | id
value_list ::= '[' list_entry* ']'
list_entry ::= (id ('=' string)? | string) ';'?
For the complete list of attributes, see Attribute Reference.
Comments¶
Grammar-Kit supports both comment styles:
// Line comment: everything after // to end of line
/* Block comment:
can span multiple lines */
Operators Summary¶
| Symbol | Name | Meaning |
|---|---|---|
::= | Definition | Defines a rule |
\| | Choice | Separates alternatives |
? | Optional | Zero or one |
* | Repetition | Zero or more |
+ | One-or-more | One or more |
& | And-predicate | Positive lookahead |
! | Not-predicate | Negative lookahead |
= | Assignment | Assigns an attribute value |
( ) | Parentheses | Groups expressions |
[ ] | Brackets | Optional group (same as (...)?) |
{ } | Braces | Attribute blocks |
<< >> | External | External expression call |
// | Line comment | Comment to end of line |
/* */ | Block comment | Multi-line comment |
; | Semicolon | Optional statement terminator |
Reserved Identifiers¶
Grammar-Kit reserves these identifiers for internal use:
| Identifier | Usage |
|---|---|
regexp: | Prefix for regexp token definitions in the tokens attribute |
#auto | Value for recoverWhile that means "not in NEXT set of this rule" |
TokenSets | Generated inner class name for token set constants |