Re: ENT_ALL or similar option for htmlspecialchars[_decode]?
Em 2013-06-28 4:10, Kris Craig escreveu:
On Thu, Jun 27, 2013 at 6:43 PM, Yasuo Ohgaki <
[email protected]> wrote:
2013/6/27 Kris Craig <
[email protected]>
Yeah I tried html_entity_decode already, but it just returned NULL. On
the same input string, htmlspecialchars_decode returned the input string
but with *some* special characters decoded; 10 and 13 ("\r\n", I think)
were left in their encoded state. I'm not sure why there wouldn't be an
option to decode all html special characters.
You are missing the design purpose of htmlspecialchars_decode and html_entity_decode. Thruth is, they are not useful as they might seem. Their purpose is not to decode all the entities, like a browser would do. We do not implement anything approaching the sort parsing a browser would do; for instance, html 5 says you should accept certain entities not terminated with ; and parse the stream in a certain way and we don't do it at all. The purpose of those two functions is just to provide something approaching an inverse function for htmlspecialchars() and htmlentities(). html_entity_decode() has somewhat deviated from this (for instance, it decodes all numeric entites), but I think this should nevertheless be the proper way one should think about those two functions.
Not only HTML entities, we really needs to add several decoder/encoder to
core.
For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc.
I hope someone is working on it :)
Would you be interested in co-authoring an RFC with me for this?
See http://php.net/manual/en/transliterator.transliterate.php For HTML entities, out of the box, only a transliterator for numeric entities is provided (hex-any/XML10), but you can easily build your ruleset for the named entities. The performance will be below of that of a dedicated algorithm, though. And it only supports UTF-8.
--
Gustavo Lopes
Thread (10 messages)