Hi Maté,
> On Mar 18, 2025, at 15:15, Máté Kocsis <[email protected]> wrote:
>
> There's no way I would have written an implementation from scratch. I'm using the url
> module of the Lexbor C library (https://github.com/lexbor/lexbor/) for handling WHATWG URLs.
> It's already bundled in core, and it's also battle tested, and it has exceptional
> maintenance.
I did not mean to imply writing a parser from scratch; my apologies for phrasing it poorly.
> All I had to implement is the glue between userland and the C library.
That is more what I was getting at. Rowbot has a lot of what looks to be good design work on
structures that come out of the parsing, in addition to a separate parser class.
The RFC might benefit from an explicit and intentional review of, and maybe incorporation of, some
of the pre-existing Rowbot design work. At least one thing from Rowbot is absolutely not applicable
to the RFC (e.g. the PSR-3 logging); maybe none of rest of it will be applicable either, but as
prior art from someone acknowledged in the WHATWG-URL spec, I think it bears your close attention.
As an overview, the following is a brief comparison between Rowbot and the RFC; any missed or
misrepresented functionality is unintentional.
* * *
## RFC
One non-final readonly Url class:
- 5 getRaw...() methods, 8 get...() methods, and one get...ForDisplay() method
- immutability via 8 with...() methods, broadly expecting properly-encoded arguments, and
soft-erroring on invalid characters
- a static parse() method, with relative parsing capability and a place to capture errors
- equals() to compare two URLs
- toString() for machine-friendly string recomoposition
- toDisplayString() for human-friendly string recomposition
- resolve() to resolve a relative URL using the current URL as the base
- serialize/deserialize; "the serialized form only includes the recomposed URI itself exposed
as the __uri
field, but the individual properties or URI components are not
present."
- no URLSearchParams implementation
## Rowbot
(None of the classes are readonly or final; these look to hew closely to the WHATWG-URL spec.)
A BasicURLParser class:
- affords relative parsing capability and an option parameter for the target URLRecord
- returns a URLRecord
A URLRecord class:
- public mutable properties for the URL components
- $scheme is a Scheme implementation with equals() and other is...() methods
- $host is a HostInterface (and implementations) with equals() and other is...() methods
- $path is a PathInterface (and PathList implementation) with PathSegment manipulation methods
- setUsername() and setPassword() mutators
- serializing
- getOrigin(), includesCredentials(), isEqual()
A URL class:
- Composed of a URLRecord and a URLSearchParams object
- Constructor takes a string, parses it to a URLRecord, and retains the URLRecord
- a static parse() method with relative parsing, as a convenience method
- __toString() and toString() return the serialized URLRecord
- Virtual properties for $href, $origin, $protocol, $username, $password, $host, $hostname, $port,
$pathname, $search, $searchParams, $hash
- Mutability of virtual properties via magic __set()
- Readability of virtual properties via magic __get()
A URLSearchParams class:
- search params manipulation methods
- implements Countable, Iterator, Stringable
- composed of a QueryList implementation and (optionally) the originating URLRecord
* * *
-- pmj