Re: [RFC] Improve HTML escape
Hi Lester,
On 3 February 2014 23:22, Lester Caine <[email protected]> wrote:
> Yasuo Ohgaki wrote:
>>
>> I'm lost here.
>> OWASP suggests to escape at least
>>
>> & --> &
>> < --> <
>> > --> >
>> " --> "
>> ' --> ' ' not recommended because its not in the HTML spec
>> (See: section 24.4.1) ' is in the XML and XHTML specs.
>> / --> / forward slash is included as it helps end an HTML
>> entity
>>
>>
>> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content
>>
>> I'm not sure why you state "already violate this requirement".
>
>
> It may be that what you are asking for is a flag on htmlentities for 'OWASP'
> compliant option. Others would probably view that as not then being html5
> compliant since html5 has it's own list of 'escaped' characters. One of the
> irritating things I find is 'unescaping' a string does not return the
> original string simply because the html5 rule has not been followed! A clean
> html5 result should be the default.
OWASP compliance focuses on the special characters which are the same
regardless of HTML spec. What is output MAY differ which is why it
suggests something like hex encoding where differences between specs
exist.
> Looking at the Rule 2 from the OWASP they are actually asking for every
> character below 256 to be escaped when used in an attribute! But the
> important thing here is 'untrusted' data, and sanitising any externally
> supplied data needs a little more care than simply trying to wrap it in
> htmlentities which I think is what Stas is saying? Personally I try to avoid
> any path where input can be processed direct back to output, filter the
> input, don't simply try and patch the output?
It's not a question of validating/filtering input. Handling input
get's it into the application where Mystery Process 1 - Infinity are
performed. Who knows what these Mystery Processes do? I don't - I'm
not writing everyones application for them! They could be grabbing
data, transforming it, reading from the database, using a Composer
package replaced en route by the NSA, etc. Ergo, we escape on output
to HTML/JSON at all times and without exception. The same way we
escape on output and without exception when the output target is a
database. Input and Output are like borders - nobody gets across them
without a customs check. It may seem unnecessary at times but that's
because most of the point is to consistent to a fault to eliminate the
risk of any errors in those Mystery Processes and to guarantee that
the correct escaping is performed - DB? JSON? HTML? XML? RPC? Command
Line?
Also helps not having to dissect every single application route just
to figure out every input's output encoding... That just drives me
nuts, and I have seen it. It's easy to forget sometimes that other
people have to maintain and audit your applications at times, so go
easy on them!
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative
Thread (37 messages)