Re: ENT_ALL or similar option for htmlspecialchars[_decode]?

From: Gustavo Lopes Date: Fri, 28 Jun 2013 05:21:54 +0000

Subject: Re: ENT_ALL or similar option for htmlspecialchars[_decode]?

References: 1 2 3 4 5 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Em 2013-06-28 4:10, Kris Craig escreveu:
On Thu, Jun 27, 2013 at 6:43 PM, Yasuo Ohgaki <[email protected]> wrote:

2013/6/27 Kris Craig <[email protected]>

Yeah I tried html_entity_decode already, but it just returned NULL. On
the same input string, htmlspecialchars_decode returned the input string
but with *some* special characters decoded; 10 and 13 ("\r\n", I think)
were left in their encoded state.  I'm not sure why there wouldn't be an
option to decode all html special characters.


You are missing the design purpose of htmlspecialchars_decode and html_entity_decode. Thruth is, they are not useful as they might seem. Their purpose is not to decode all the entities, like a browser would do. We do not implement anything approaching the sort parsing a browser would do; for instance, html 5 says you should accept certain entities not terminated with ; and parse the stream in a certain way and we don't do it at all. The purpose of those two functions is just to provide something approaching an inverse function for htmlspecialchars() and htmlentities(). html_entity_decode() has somewhat deviated from this (for instance, it decodes all numeric entites), but I think this should nevertheless be the proper way one should think about those two functions.


Not only HTML entities, we really needs to add several decoder/encoder to
core.
For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc.
I hope someone is working on it :)


Would you be interested in co-authoring an RFC with me for this?


See http://php.net/manual/en/transliterator.transliterate.php For HTML entities, out of the box, only a transliterator for numeric entities is provided (hex-any/XML10), but you can easily build your ruleset for the named entities. The performance will be below of that of a dedicated algorithm, though. And it only supports UTF-8.

-- 
Gustavo Lopes

Thread (10 messages)

Kris CraigThu, 27 Jun 2013 01:21:34 +0000
Yasuo OhgakiThu, 27 Jun 2013 07:03:21 +0000
Kris CraigThu, 27 Jun 2013 08:42:26 +0000
Yasuo OhgakiFri, 28 Jun 2013 01:43:15 +0000
Kris CraigFri, 28 Jun 2013 02:10:55 +0000
Gustavo LopesFri, 28 Jun 2013 05:21:54 +0000Re: ENT ALL or similar option for htmlspecialchars[ decode]?
Tjerk Anne MeestersFri, 28 Jun 2013 02:54:41 +0000
Kris CraigFri, 28 Jun 2013 04:20:52 +0000
Kris CraigFri, 28 Jun 2013 04:38:17 +0000
Tjerk Anne MeestersFri, 28 Jun 2013 04:47:01 +0000

« previous	php.internals (#67999)	next »

From:	Gustavo Lopes	Date:	Fri, 28 Jun 2013 05:21:54 +0000
Subject:	Re: ENT_ALL or similar option for htmlspecialchars[_decode]?
References:	1 2 3 4 5	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message