Re: [RFC] Decoding HTML and the Ambiguous Ampersand

From: Date: Sun, 25 Aug 2024 21:56:06 +0000
Subject: Re: [RFC] Decoding HTML and the Ambiguous Ampersand
References: 1 2 3 4 5 6  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message

> On Aug 25, 2024, at 4:17 PM, Máté Kocsis <[email protected]> wrote:
> 
> Hi Christoph, Dennis,
> 
>> Well, I don't think it would be a big deal to move the bundled lexbor to
>> somewhere where it is always available.  I mean, so far it's only used
>> by ext/dom so it's bundled there, but if other parts of the php-src code
>> base would use it, we could put it elsewhere.
> 
> Exactly. You might be aware that I'm working on an "uri" extension
> (https://externals.io/message/123997)

Yes, and I only briefly saw that before, but I’m excited, because I’ve wanted very much to be
able to properly parse URLs within PHP. Myself, I was also interested in seeing if we could get Ada
into the language.

As with HTML parsing, I see much value in having additional interfaces that aren’t a DOM interface
but which are designed for specific software purposes.

> and it also needs some parts of lexbor. My implementation currently depends on ext/dom
> for simplicity's sake, however if the vote once passes, this temporary solution has to be
> changed.
> Therefore we previously agreed with Niels that we would make lexbor an "internal
> extension" (similar to mysqlnd), or
> at least we would somehow find a way for it to be always available, just like how Christoph
> said.

With all the improvements going around PHP these days, I find it extremely important to finally be
able to reliably and safety understand some of the most basic content that we produce and parse:
HTML and URLs.

Although the user-space libraries are of varying completion and quality, all of them suffer from the
fact that it’s so challenging to efficiently parse most content using PHP. Getting these things
baked into the language of the web will bring a potent uplift to the entire ecosystem, both because
there will be less corruption, but also because performance won’t suffer in getting there.

> 
> Regards,
> Máté
> 



Thread (12 messages)

« previous php.internals (#125244) next »