Skip to content

How to resolve a shift/reduce conflict for greedy modifiers in a Tree-sitter grammar for VB.NET? #382

@govindbanura

Description

@govindbanura

I am developing a Tree-sitter grammar for VB.NET and have run into a persistent parsing issue with member declarations that have multiple modifiers. The parser fails to be "greedy" and consumes only the first modifier, misinterpreting the second modifier as a variable name.

This appears to be a classic shift/reduce conflict, but the standard solutions I've tried (using prec, prec.dynamic, and the conflicts array) have not resolved the issue, often because they interfere with other precedence rules in the grammar.

The Problem

Given a simple VB.NET class, the parser should correctly handle fields with both single and multiple modifiers.

Minimal VB.NET Example:

Public Class MyTestClass
    ' This line with a single modifier parses correctly.
    Private _someField As String

    ' This line with multiple modifiers fails.
    Private ReadOnly _anotherField As Integer
End Class

When parsing the line Private ReadOnly _anotherField As Integer, the parser incorrectly stops after Private and tries to parse ReadOnly as the field's name.

Incorrect AST Output:

The resulting Abstract Syntax Tree for the failing line looks like this, clearly showing the error

(field_declaration
  (modifiers
    (member_modifier)  -- "Private"
  )
  (variable_declarator
    (identifier)       -- "ReadOnly"
  )
  (ERROR)              -- "_anotherField As Integer"
)

The modifiers rule is not greedy, and an ERROR node is produced.

Relevant Grammar Snippet (grammar.js)
Here are the key rules from my grammar.js that are involved in this issue.

module.exports = grammar({
  name: 'vbnet',
  // ... other rules and extras

  rules: {
    // ...

    member_modifier: $ => choice(
      ci('Public'), ci('Private'), ci('Protected'), ci('Friend'),
      ci('Protected Friend'), ci('Private Protected'), ci('ReadOnly'),
      ci('WriteOnly'), ci('Shared'), ci('Shadows'), ci('MustInherit'),
      ci('NotInheritable'), ci('Overrides'), ci('MustOverride'),
      ci('NotOverridable'), ci('Overridable'), ci('Overloads'),
      ci('WithEvents'), ci('Widening'), ci('Narrowing'),
      ci('Partial'), ci('Async'), ci('Iterator')
    ),

    modifiers: $ => repeat1($.member_modifier),

    _type_member_declaration: $ => choice(
      // ... other members like empty_statement, inherits_statement
      prec(2, $.constructor_declaration),
      prec(1, $.method_declaration),
      prec(1, $.property_declaration),
      // ... other members with precedence
      $.field_declaration // Lower precedence
    ),

    field_declaration: $ => seq(
      optional(field('attributes', $.attribute_list)),
      field('modifiers', $.modifiers),
      commaSep1($.variable_declarator),
      $._terminator
    ),

    variable_declarator: $ => seq(
      field('name', $.identifier),
      optional($.array_rank_specifier),
      optional($.as_clause),
      optional(seq('=', field('initializer', $._expression)))
    ),

    // ... other rules
  }
});

function ci(keyword) {
  return new RegExp(keyword.split('').map(letter => `[${letter.toLowerCase()}${letter.toUpperCase()}]`).join(''));
}
// ... other helpers

The Question

How can I modify this Tree-sitter grammar to correctly and "greedily" parse multiple consecutive modifiers in a field_declaration, while still correctly resolving the ambiguities between different types of member declarations (e.g., a field_declaration vs. a method_declaration)?

I can provide more details of the grammar if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions