Re: [RFC] Improve HTML escape

From: Date: Mon, 03 Feb 2014 11:06:22 +0000
Subject: Re: [RFC] Improve HTML escape
References: 1 2 3 4 5 6  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi Stas,

On 3 February 2014 08:17, Stas Malyshev <[email protected]> wrote:
>
> Hi!
>
> > Users can do
> >
> > <tag attr='<?php echo htmlentities($str)?>' >
>
> They also can do <? echo $str; ?> and <? eval($_GET['f']); ?>.
> That's
> not what they _should_ be doing, but they _can_ do it. That doesn't mean
> there's something wrong with echo or PHP compiler.

I don't believe this has anything to do with the question at hand.

> > and this is valid. I think there is no reason not to escape ' by default.
> >
> > I agree that user should not use unquoted attributes in general.
> >
> > '/' escape  could be still useful. For example, user may have validation
>
> I don't see how it would be useful.

I'm not sure it is either. OWASP definitely notes it, but it's not an attribute

termination character inside quotes.


> > There is no reason not to escape these chars by default. IMHO.
>
> There is a reason - there's no reason to escape them. In every scenario
> that htmlentites should be used, escaping them is useless. In every
> scenario where espacing / is useful, htmlentities should not be used. By
> promoting usage of htmlentities in scenarios where it should absolutely
> not be used, we are only doing the users a disservice.


There are three ways to present an attribute value validly in HTML5:

1. Double quoted
2. Single quoted
3. Unquoted.

Bearing in mind that people who use htmlentities() make a mockery of UTF-8 by

overescaping and increasing output page size for no good reason whatsoever, both

htmlspecialchars() and htmlentities() only work by default for the first option.

They do not work by default for the last two options.

In userland, virtually all security-concious libraries and frameworks cover TWO

options: 1 and 2 by setting ENT_QUOTES. It seems reasonable for PHP to make the

change also unless it has some hitherto unmentioned downside.

Also, for reference, here is the actual paragraph from the OWASP XSS cheatsheet:

"Escape the following characters with HTML entity encoding to prevent switching

into any execution context, such as script, style, or event handlers. Using hex

entities is recommended in the spec. In addition to the 5 characters significant

in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML

entity."

I read "entity" as "tag".


--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative


Thread (37 messages)

« previous php.internals (#72079) next »