-
Notifications
You must be signed in to change notification settings - Fork 8k
ext/uri: Fix the distinction between an empty and a missing query/fragment for WHATWG URLs #20208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: PHP-8.5
Are you sure you want to change the base?
Conversation
…gment for WHATWG URLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to give this a proper look tomorrow. Looking at the tests the output makes sense though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks right to me.
As a follow-up (likely master-only), you can use ZVAL_STRINGL_FAST
which will avoid allocating empty strings (and 1-byte-long strings) by using interned strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see, the WHATWG URL standard treats missing and empty queries and fragments the same with regard to the getters. The difference is however preserved when recomposing the URL (which is also the case for the toAsciiString()
method in PHP already) - unless explicitly overwritten. Which makes for funky behavior:
> u = new URL('https://example.com/?#');
URL {
href: 'https://example.com/?#',
origin: 'https://example.com/',
protocol: 'https:',
username: '',
password: '',
host: 'example.com',
hostname: 'example.com',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
> u.search
''
> u.hash
''
> u.toString()
'https://example.com/?#'
> u.hash = u.hash
''
> u.search = u.search
''
> u.toString()
'https://example.com/'
It thus makes sense to me to consider this another case of:
Getters of Uri\WhatWg\Url have a few gotchas for the ones who are inherently familiar with the WHATWG URL specification: they don't (entirely) follow the “getter steps” that are defined by the specification, but the individual components are returned directly without any other changes that the “getter steps” would otherwise specify.
And make them differ between missing and empty to be consistent with the recomposition and with Rfc3986.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which makes for funky behavior:
I am seeing that this also exists with PHP after applying the patch:
$url1 = Uri\WhatWg\Url::parse("https://example.com/?#");
$url2 = $url1->withFragment($url1->getFragment());
var_dump($url1->getFragment()); // string(0) ""
var_dump($url2->getFragment()); // NULL
var_dump($url2->toAsciiString()); // string(21) "https://example.com/?"
So perhaps the correct solution is to not make getFragment()
and getQuery()
nullable and always return an empty string? That should still be okay for “baseline compatibility” between RFC 3986 and WHATWG.
Selecting “Request Changes” for visibility.
Similarly for the username:
I'd argue that the username is empty in this case, not missing. It currently returns
Here the |
BTW It's possible to "fix" this case by the following changes: :D
I love that there's this loophole TBH... 😅 And I find it important to have a distinction between empty/missing, especially if toString() returns the leading
As sad as it is, I'm OK to get rid of nullability here.. It doesn't look right after seeing what happens in case of |
After thinking about this a little bit more.. I'm not sure anymore if it's an issue at all. Apparently, WHATWG URL doesn't give a big importance for the distinction between missing and empty: this is best observed in the getter steps. When passing an empty string to the What matters more for me is that:
All these seem more important to me than completely symmetrical results when an empty query/fragment is passed to a wither. |
Split from #19970