Skip to content

Parser Generation

Grammar-Kit transforms BNF rules into a recursive-descent parser implemented as static Java methods. The generator reads your .bnf file, resolves attributes, and produces up to five categories of output files: the parser class, an element types holder, PSI interfaces, PSI implementation classes, and a visitor class.

Running the Generator

You can trigger generation in three ways.

IDE action is the preferred approach. Open a .bnf file and press Ctrl+Shift+G (Windows/Linux) or Cmd+Shift+G (macOS). Grammar-Kit saves all open files, resolves the output directory from the parserClass attribute's package, runs the generator in a background task, and reports the number of files, total size, and duration in a notification. The output directory structure mirrors the Java package hierarchy.

Command line works for automation outside the IDE:

java -jar grammar-kit.jar src/gen src/grammars/MyLang.bnf

Or with an explicit classpath when the grammar-kit jar does not bundle all dependencies:

java -cp grammar-kit.jar:intellij-deps.jar \
  org.intellij.grammar.Main src/gen src/grammars/

Gradle plugin integrates generation into the build. Use the gradle-grammar-kit-plugin for CI/CD pipelines and team builds. See Build Integration for configuration details.

Warning

The Gradle plugin does not support method mixins (two-pass generation is not implemented). Generic signatures and annotations may also differ. If your grammar uses mixin or psiImplUtilClass method injection, generate from the IDE and commit the output.

Generated Files

The generator produces files in a fixed order. Each category can be controlled through attributes.

Parser class. One Java class (or several, if you use section-level parserClass overrides) containing a static method for each BNF expression. The class implements LightPsiParser and delegates to GeneratedParserUtilBase for marker management, error recovery, and section handling. The name and package come from the parserClass attribute.

Element types holder. An interface containing IElementType constants for all composite (rule) types and, if tokens="yes", all token types. It also contains a Factory class with a createElement method that maps each element type to its PSI implementation:

public interface MyTypes {
  IElementType STATEMENT = new IElementType("STATEMENT", MyLanguage.INSTANCE);
  IElementType EXPRESSION = new IElementType("EXPRESSION", MyLanguage.INSTANCE);

  IElementType PLUS = new MyTokenType("PLUS");
  IElementType NUMBER = new MyTokenType("NUMBER");

  class Factory {
    public static PsiElement createElement(ASTNode node) {
      IElementType type = node.getElementType();
      if (type == STATEMENT) return new StatementImpl(node);
      if (type == EXPRESSION) return new ExpressionImpl(node);
      throw new AssertionError("Unknown element type: " + type);
    }
  }
}

The holder class name comes from elementTypeHolderClass. Constant name casing is controlled by generate=[element-case="upper"] and generate=[token-case="upper"]. The elementTypePrefix attribute adds a prefix to all constant names.

PSI interfaces. One interface per non-private, non-fake rule that produces an AST node. Each interface extends the class or interface specified by implements (default: PsiElement) and contains getter methods for child elements.

PSI implementation classes. One class per PSI interface. Each extends the class specified by extends (default: ASTWrapperPsiElement) or the mixin class, and implements its corresponding interface. The class suffix is controlled by psiImplClassSuffix (default: "Impl").

Visitor class. Generated when generate=[visitor="yes"] (the default). Contains a visit method for each PSI type, with dispatch following the extends hierarchy. The visitor class name comes from psiVisitorName (default: "Visitor"). If visitor-value is set, the visitor becomes generic: Visitor<R>.

Grammar-to-Code Mapping

Understanding how BNF constructs map to Java helps when reading generated code or debugging parse failures.

A sequence becomes a short-circuit && chain. If any part fails before a pin point, the parser rolls back:

rule ::= part1 part2 part3
public static boolean rule(PsiBuilder b, int l) {
  if (!recursion_guard_(b, l, "rule")) return false;
  boolean r;
  Marker m = enter_section_(b);
  r = part1(b, l + 1);
  r = r && part2(b, l + 1);
  r = r && part3(b, l + 1);
  exit_section_(b, m, RULE, r);
  return r;
}

An ordered choice becomes a fallthrough chain. The parser tries each alternative until one succeeds:

rule ::= part1 | part2 | part3
public static boolean rule(PsiBuilder b, int l) {
  if (!recursion_guard_(b, l, "rule")) return false;
  boolean r;
  Marker m = enter_section_(b);
  r = part1(b, l + 1);
  if (!r) r = part2(b, l + 1);
  if (!r) r = part3(b, l + 1);
  exit_section_(b, m, RULE, r);
  return r;
}

A zero-or-more repetition becomes a while(true) loop that always returns true (zero matches is valid):

rule ::= part *
public static boolean rule(PsiBuilder b, int l) {
  while (true) {
    if (!part(b, l + 1)) break;
  }
  return true;
}

Expression rules that use the extends pattern produce an optimized Pratt parser. Instead of one method per precedence level, Grammar-Kit generates two methods for the root expression rule and a priority table as a comment:

// Expression root: expr
// Operator priority table:
// 0: BINARY(assign_expr)
// 1: BINARY(plus_expr) BINARY(minus_expr)
// 2: BINARY(mul_expr) BINARY(div_expr)
// 3: PREFIX(unary_plus_expr) PREFIX(unary_min_expr)
// 4: POSTFIX(factorial_expr)
// 5: ATOM(literal_expr) PREFIX(paren_expr)
public static boolean expr(PsiBuilder b, int l, int g) { ... }

The generator names sub-expression methods by appending position indices: rule_name_0, rule_name_1_2. Avoid naming your own rules in this rule_name_N1_N2 pattern to prevent conflicts.

Configuration

The generate attribute controls several aspects of the generated output. The most commonly adjusted options:

Option Values Effect
java 6, 8, 11 Java version; affects lambda vs anonymous class syntax
names short, long, classic Variable names in generated parser (b/l/r vs builder/level/result)
fqn yes, no Fully qualified names instead of imports
element-case lower, upper, as-is Casing for element type constants
token-case lower, upper, as-is Casing for token type constants
psi yes, no Generate PSI classes
visitor yes, no Generate visitor class
visitor-value void, type name Visitor return type parameter

Bold values are defaults. See Attributes System for the complete options table.

The classHeader attribute adds a comment header to all generated files. It accepts either literal text or a filename (resolved relative to the grammar file's directory):

{
  classHeader="license.txt"
}

Tip

Use generate=[names="long"] during development if you need to step through the generated parser in a debugger. Switch back to names="short" for production to keep the generated code compact.

For Gradle-based generation and CI/CD setup, see Gradle Plugin Setup.