Re: Pattern matching details questions

From: Date: Fri, 07 Feb 2025 00:35:25 +0000
Subject: Re: Pattern matching details questions
References: 1 2 3 4  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
On Thu, Feb 6, 2025 at 10:43 PM Christoph M. Becker <[email protected]> wrote:
>
> On 06.02.2025 at 20:24, Larry Garfield wrote:
>
> > On Thu, Feb 6, 2025, at 3:05 AM, Valentin Udaltsov wrote:
> >
> >> Are there any plans to upgrade the parser to bypass these limitations?
> >> I remember Nikita shared some thoughts on why this is not trivial in
> >> https://wiki.php.net/rfc/arrow_functions_v2.
> >> Maybe something has
> >> changed since then?
> >
> > I'm not aware of any plans to change the parser.  That would be a rather dramatic and
> > invasive change.
>
> There have been ideas to use some more powerful features of bison[1],
> like GLR, so that would not necessarily be a drastic and invasive
> change.  I'm not aware of any concrete plans, and these more powerful
> features are not without downsides.

I don't think there's a big incentive to switch to a GLR parser right
now. First off, I don't believe it actually solves the ambiguity
problem we've described in this thread (`class C { public $prop = 42
is Foo{}; }`), which is not limited by lookahead, but is a full blown
syntax ambiguity. *Technically* it could be solved in our current
LALR(1) parser by duplicating the expr production, removing pattern
matching in this production and using it solely for property
initializers, but this is a bad long term solution.

Secondly, single lookahead grammars are easier for machines and humans
to understand. Unfortunately, it's hard to predict future syntax
changes, but I believe we have managed to find acceptable compromises
so far. It's worth noting that some newer languages also strive to
avoid +1 lookahead grammars. As an example, see Rust's turbofish
syntax (e.g. Vec::<u32>), used for generics in the general
expression context to avoid confusion with < lower than comparison.

Also worth noting: Switching to a GLR parser might cause a significant
amount of work for nikic/PHP-Parser, which is based on
ircmaxell/php-yacc, which can only generate LALR(1) parsers. It might
cause even more problems for token-based tools. Sticking with the
generics example, [bar < Bar, Baz > ()] will require a lot of
scanning to understand whether to remove the spaces between bar and
<. The ::< turbofish syntax on the other hand immediately
indicates generics.

Anyway, it seems we have slightly gone off-topic. :)

Ilija


Thread (12 messages)

« previous php.internals (#126317) next »