Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API

From: Date: Mon, 24 Feb 2025 12:48:14 +0000
Subject: Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API
References: 1 2 3 4 5 6 7 8 9 10 11  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi

Am 2025-02-24 12:08, schrieb Nicolas Grekas:
The situation I'm telling about is when one will accept an argument described as function (\Uri\WhatWg\Url $url) If the Url class is final, this signature means only one possible implementation can ever be passed: the native one. Composition cannot be achieve because there's no type to compose.
Yes, that's the point: The behavior and the type are intimately tied together. The Uri/Url classes are representing values, not services. You wouldn't extend an int either. For DateTimeImmutable inheritance being legal causes a ton of needless bugs (especially around serialization behavior).
Fine-tuning the behavior provided by the RFC is what we might be most interested in, but we should not forget that we also ship a type. By making
For a given specification (RFC 3986 / WHATWG) there is exactly one correct interpretation of a given URL. “Fine-tuning” means that you are no longer following the specification.
the type non-final, we keep things open enough for userland to build on it.
This works:
    final class HttpUrl {
        private readonly \Uri\Rfc3986\Uri $uri;
        public function __construct(string $uri) {
            $this->uri = new \Uri\Rfc3986\Uri($uri);
            if ($this->uri->getScheme() !== 'http') {
                throw new ValueError('Scheme must be http');
            }
        }
        public function toRfc3986(): \Uri\Rfc3986\Uri {
            return $this->uri;
        }
    }
Userland can easily build their convenience wrappers around the classes, they just need to export them to the native classes which will then guarantee that the result is fully validated and actually a valid URI/URL. Keep in mind that the ext/uri extension will always be available, thus users can rely on the native implementation.
By making the classes non-final, there will be one base type to build upon for userland. (the alternative would be to define native UrlInterface, but that'd increase complexity for little to no gain IMHO - althought that'd solve my main concern).
Mate already explained why a native UriInterface was intentionally removed from the RFC in https://news-web.php.net/php.internals/126425.
The RFC is also missing whether __debugInfo returns raw or non-raw components. Then, I'm wondering if we need this per-component break for debugging at all? It might be less confusing (on this encoding aspect) to dump basically what __serialize() returns (under another key than __uri of course).
That would also work for me.
It can make sense to normalize a hostname, but not the path. My usual example against normalizing the path is that SAML signs the *encoded* URI instead of the payload and changing the case in percent-encoded characters is sufficient to break the signature
I would be careful with this argument: signature validation should be done on raw bytes. Requiring an object to preserve byte-level accuracy while the very purpose of OOP is to provide abstractions might be conflicting. The signing topic can be solved by keeping the raw signed payload around.
Yes, the SAML signature behavior is wrong, but I did not write the SAML specification. I just pointed out how it a possible use-case where choosing the raw or normalized form depends on the component and where a “get all components” function would be dangerous. Best regards Tim Düsterhus

Thread (152 messages)

« previous php.internals (#126490) next »