Am 18.07.2024 um 01:15 schrieb Juliette Reinders Folmer <[email protected]>:
Hi all,
I recently discovered a change was made to the Tokenizer in PHP 8.3.0 which now allows for a comment
to exist between the yield
and from
keyword from the yield
from
keyword.
Before PHP 8.3, this was a parse error.
This change was not documented (anywhere) nor publicized, which is why I only recently came across
it and opened a bug report on GitHub about it [1].
This change is a breaking change for projects consuming the token stream, but more than that, it
introduces an inconsistency in the Tokenizer as there is no other single token in PHP which can
contain a comment (aside from comments tokens of course).
Comments were even explicitly forbidden in the PHP 8.0 RFC which changed the tokenization of
namespaced names [2].
Some more detailed explanation and examples can be found in the GH thread [3] and a detailed
analysis of the tokenization change can be found in a ticket in the PHP_CodeSniffer repo [4]
(details in a fold out in the comment).
In my opinion the change is a bug and should be reverted, but opinions on this are divided.
After a discussion in the ticket on GitHub, it was suggested to ask for a third/fourth/fifth opinion
here on the list.
Pros for reverting it:
- Consistency with other tokens in the token stream, none of which allow comments within the token.
- Consistency with the 7 years before in which this was a parse error.
- The change was never documented, not in the changelog, news, nor in the upgrading guide.
Cons against reverting it:
- The change has been in PHP 8.3 releases for a little over seven months.
- The odd few people may have accidentally discovered the bug and taken advantage by introducing
code containing yield /*comment*/ from
into their project, which with a revert would
become a parse error again.
Opinions ?
Of course, the ticket yielded discussion on whether the tokenization of yield from
should be changed anyhow, but please consider that discussion out of scope.
The current question is only about reverting the bug versus regarding it as a new feature and
properly documenting it.
Smile,
Juliette
1: https://github.com/php/php-src/issues/14926
2: https://wiki.php.net/rfc/namespaced_names_as_token#backward_incompatible_changes
3: https://github.com/PHPCSStandards/PHP_CodeSniffer/issues/529#issuecomment-2209337419
4: https://github.com/php/php-src/issues/14926#issuecomment-2227152061
Thanks for bringing this up.
I'm honestly not sure what the best behavior for PHP 8.3 is.
But I'm strongly thinking the behavior of allowing comments between yield and from is natural
and expected. So, for PHP 8.4 I propose we ensure proper tokenization of yield /* comment */ from.
Changing this in PHP 8.3 is likely too much of a compatibility break.
So, assuming we want the proper tokenization behavior for PHP 8.4, I think changing the behavior
intermittently to disallow comments between yield and from in PHP 8.3 is counterproductive.
Moreover, it can - at least - be worked around in tooling by special casing the T_YIELD_FROM token
and extracting the comment from the raw parsed string:
var_dump(token_get_all('<?php yield /* comment */ from $foo;'));
will contain:
[1]=> array(3) { [0]=> int(270) [1]=> string(24) "yield /* comment */ from"
[2]=> int(1) }
It's not optimal, but probably the least bad solution to leave it unchanged in PHP 8.3, have
tooling special case it and properly fix it in PHP 8.4.
Bob