Re: ENT_ALL or similar option for htmlspecialchars[_decode]?

From: Date: Fri, 28 Jun 2013 05:21:54 +0000
Subject: Re: ENT_ALL or similar option for htmlspecialchars[_decode]?
References: 1 2 3 4 5  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Em 2013-06-28 4:10, Kris Craig escreveu:
On Thu, Jun 27, 2013 at 6:43 PM, Yasuo Ohgaki <[email protected]> wrote:
2013/6/27 Kris Craig <[email protected]>
Yeah I tried html_entity_decode already, but it just returned NULL. On the same input string, htmlspecialchars_decode returned the input string but with *some* special characters decoded; 10 and 13 ("\r\n", I think) were left in their encoded state. I'm not sure why there wouldn't be an option to decode all html special characters.
You are missing the design purpose of htmlspecialchars_decode and html_entity_decode. Thruth is, they are not useful as they might seem. Their purpose is not to decode all the entities, like a browser would do. We do not implement anything approaching the sort parsing a browser would do; for instance, html 5 says you should accept certain entities not terminated with ; and parse the stream in a certain way and we don't do it at all. The purpose of those two functions is just to provide something approaching an inverse function for htmlspecialchars() and htmlentities(). html_entity_decode() has somewhat deviated from this (for instance, it decodes all numeric entites), but I think this should nevertheless be the proper way one should think about those two functions.
Not only HTML entities, we really needs to add several decoder/encoder to core. For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc. I hope someone is working on it :)
Would you be interested in co-authoring an RFC with me for this?
See http://php.net/manual/en/transliterator.transliterate.php For HTML entities, out of the box, only a transliterator for numeric entities is provided (hex-any/XML10), but you can easily build your ruleset for the named entities. The performance will be below of that of a dedicated algorithm, though. And it only supports UTF-8. -- Gustavo Lopes

Thread (10 messages)

« previous php.internals (#67999) next »