Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API

From: Date: Sun, 23 Feb 2025 17:30:14 +0000
Subject: Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API
References: 1  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi Máté,

I've read the latest version of the RFC and while I very much like the RFC, I have some
remarks.

1.
The paragraph in at the beginning of the RFC in the > Relevant URI specifications > WHATWG URL
section seems to be incomplete.

2.
I don't really understand how the UninitializedUriException exception can be thrown?
Is it somehow possible to create an instance of a URI without initializing it?
This seems unwise in general.

3.
I'm not really convinced by using the constructor to be able to create a URI object.
I think it would be better for it to be private/throwing and have two static constructor
parse and tryParse,
mimicking the API that exists for creating an instance of a backed enum from a scalar.

4.
I think changing the name of the toString method to toRawString better
matches the rest of the proposed API,
and also removes the question as to why it isn't the magic method __toString.

5.
I will echo Tim's concerns about the non-final-ity of the URI classes.
This seems like a recipe for disaster.
I can _maybe_ see the usefulness of extending Rfc3986\Uri by a subclass Ldap\Uri,
but being able to extend the WhatWg URI makes absolutely no sense.
The point of these classes is that if you have an instance of one of these, you *know* that you have
a valid URI.
Being able to subclass a URI and mess with the equals, toString,
toNormalizedString methods throws away all the safety guarantees provided by
***possessing*** a Uri instance.

Moreover, like Tim previously mentioned, if you subclass you need to override all the methods,
and you might end up in the similar situation which lead to the removal of the common Uri interface
in the first place.
Which basically suggests creating a new Uri class instead of extending *anyway*.

Making these classes final just removes a lot of edge cases, some that I don't think we can
anticipate,
while also simplifying other aspects, like serialization.
As you won't need that weird __uri property any longer.

Similarly, I don't understand why the WhatWgError is not final.
Even if subclassing of the Uri classes is allowed, any error it would have would not be a WhatWg
one,
so why should you be able to extend it.

6.
Parsing API and why Monads wouldn't solve the soft error case anyway.
This is just a remark, but you wouldn't be able to really implement a monad if you want to
support partial success.
So I'm not sure mentioning the lack of monadic support in PHP is the best argument against them
for this RFC.

Best regards,

Gina P. Banyard

On Friday, 28 June 2024 at 21:06, Máté Kocsis <[email protected]> wrote:

> Hi Everyone,
>
> I've been working on a new RFC for a while now, and time has come to present it to a wider
> audience.
>
> Last year, I learnt that PHP doesn't have built-in support for parsing URLs according to
> any well established standards (RFC 1738 or the WHATWG URL living standard), since the parse_url()
> function is optimized for performance instead of correctness.
>
> In order to improve compatibility with external tools consuming URLs (like browsers), my new
> RFC would add a WHATWG compliant URL parser functionality to the standard library. The API itself is
> not final by any means, the RFC only represents how I imagined it first.
>
> You can find the RFC at the following link: https://wiki.php.net/rfc/url_parsing_api
>
> Regards,
> Máté


Thread (152 messages)

« previous php.internals (#126480) next »