-
-
Notifications
You must be signed in to change notification settings - Fork 69
Description
I am developing a Tree-sitter grammar for VB.NET and have run into a persistent parsing issue with member declarations that have multiple modifiers. The parser fails to be "greedy" and consumes only the first modifier, misinterpreting the second modifier as a variable name.
This appears to be a classic shift/reduce conflict, but the standard solutions I've tried (using prec
, prec.dynamic
, and the conflicts
array) have not resolved the issue, often because they interfere with other precedence rules in the grammar.
The Problem
Given a simple VB.NET class, the parser should correctly handle fields with both single and multiple modifiers.
Minimal VB.NET Example:
Public Class MyTestClass
' This line with a single modifier parses correctly.
Private _someField As String
' This line with multiple modifiers fails.
Private ReadOnly _anotherField As Integer
End Class
When parsing the line Private ReadOnly _anotherField As Integer
, the parser incorrectly stops after Private
and tries to parse ReadOnly
as the field's name.
Incorrect AST Output:
The resulting Abstract Syntax Tree for the failing line looks like this, clearly showing the error
(field_declaration
(modifiers
(member_modifier) -- "Private"
)
(variable_declarator
(identifier) -- "ReadOnly"
)
(ERROR) -- "_anotherField As Integer"
)
The modifiers
rule is not greedy, and an ERROR
node is produced.
Relevant Grammar Snippet (grammar.js
)
Here are the key rules from my grammar.js
that are involved in this issue.
module.exports = grammar({
name: 'vbnet',
// ... other rules and extras
rules: {
// ...
member_modifier: $ => choice(
ci('Public'), ci('Private'), ci('Protected'), ci('Friend'),
ci('Protected Friend'), ci('Private Protected'), ci('ReadOnly'),
ci('WriteOnly'), ci('Shared'), ci('Shadows'), ci('MustInherit'),
ci('NotInheritable'), ci('Overrides'), ci('MustOverride'),
ci('NotOverridable'), ci('Overridable'), ci('Overloads'),
ci('WithEvents'), ci('Widening'), ci('Narrowing'),
ci('Partial'), ci('Async'), ci('Iterator')
),
modifiers: $ => repeat1($.member_modifier),
_type_member_declaration: $ => choice(
// ... other members like empty_statement, inherits_statement
prec(2, $.constructor_declaration),
prec(1, $.method_declaration),
prec(1, $.property_declaration),
// ... other members with precedence
$.field_declaration // Lower precedence
),
field_declaration: $ => seq(
optional(field('attributes', $.attribute_list)),
field('modifiers', $.modifiers),
commaSep1($.variable_declarator),
$._terminator
),
variable_declarator: $ => seq(
field('name', $.identifier),
optional($.array_rank_specifier),
optional($.as_clause),
optional(seq('=', field('initializer', $._expression)))
),
// ... other rules
}
});
function ci(keyword) {
return new RegExp(keyword.split('').map(letter => `[${letter.toLowerCase()}${letter.toUpperCase()}]`).join(''));
}
// ... other helpers
The Question
How can I modify this Tree-sitter grammar to correctly and "greedily" parse multiple consecutive modifiers in a field_declaration
, while still correctly resolving the ambiguities between different types of member declarations (e.g., a field_declaration
vs. a method_declaration
)?
I can provide more details of the grammar if needed.