Re: [RFC] Decoding HTML and the Ambiguous Ampersand

From: Jakob Givoni Date: Sat, 24 Aug 2024 19:56:21 +0000

Subject: Re: [RFC] Decoding HTML and the Ambiguous Ampersand

References: 1 2 3 4 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Hi Dennis,

Overall it sounds like a reasonable RFC.

> Dennis:
>
> > Niels:
> >
> > I'm not so sure that the name "decode_html" is self-descriptive enough,
it sounds very generic.
>
> The name is not very important to me. For the sake of history, the reason
I have chosen “decode HTML” is because, unlike an HTML parser, this is
focused on taking a snippet of HTML “text” content and decoding it into a
“plain PHP string.”

Why not make it two methods called "decode_html_text" and
"decode_html_attribute"?
Consider the following reasons:
1. The function doesn't actually decode html as such, it decodes either an
html text node string or an html attribute string.
2. Saves the $context parameter and the constants/enums, making the call
significantly shorter.
3. It feels like decoding either text or attribute are two significantly
different things. I admit I could be wrong, if code like
decode_html($e->isAttritbute() ? HtmlContext::Attribute :
HtmlContext::Text, $e->getContent()) is likely to be seen. But I somehow
don't foresee a lot of situations where text and attribute strings end up
in the same code path?

A couple of other options that would silence anyone opposed to implicitly
favouring utf-8:
html_text_to_utf8 and html_attribute_to_utf8

Best,
Jakob

Thread (12 messages)

Dennis SnellMon, 19 Aug 2024 22:45:53 +0000
Niels DosscheThu, 22 Aug 2024 22:01:47 +0000
Dennis SnellThu, 22 Aug 2024 23:02:13 +0000
Bruce WeirdanThu, 22 Aug 2024 23:32:57 +0000
Christoph M. BeckerSat, 24 Aug 2024 12:47:43 +0000
Dennis SnellSat, 24 Aug 2024 20:34:40 +0000
Máté KocsisSun, 25 Aug 2024 21:17:40 +0000
Dennis SnellSun, 25 Aug 2024 21:56:06 +0000
Jakob GivoniSat, 24 Aug 2024 19:56:21 +0000
Dennis SnellSat, 24 Aug 2024 20:31:17 +0000
Jakob GivoniSun, 25 Aug 2024 08:15:26 +0000
Dennis SnellSun, 25 Aug 2024 15:25:07 +0000

« previous	php.internals (#125189)	next »

From:	Jakob Givoni	Date:	Sat, 24 Aug 2024 19:56:21 +0000
Subject:	Re: [RFC] Decoding HTML and the Ambiguous Ampersand
References:	1 2 3 4	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message