Providing improved functionality for escaping html (and other) output.

From: Date: Sat, 05 Jan 2013 03:59:33 +0000
Subject: Providing improved functionality for escaping html (and other) output.
Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
It's important to escape output according to context. PHP provides
functions such as htmlspecialchars() to escape output when the context
is HTML. However, one often desires to allow some subset of HTML
through without escaping (e.g., <br />, <b></b>, etc.)

Functions such as strip_tags() do allow whitelisting, but their usage
poses security risks due to lingering attributes (e.g., strip_tags('<b
onclick="alert(\'Oh no!\')">click me</b>', '<b>'.)

One can develop a more robust mechanism in userland that first escapes
input using htmlspecialchars() and then unescapes whitelisted
sequences. Because of the variance in html tags due to potential
attributes (e.g., optionally including various classes, img src
attributes, etc), offering the ability to optionally specify a
whitelist sequence through use of a regex could also offer significant
benefits (e.g., any string sequence starting and ending with '/' will
be handled as a regex.) However, the common nature of this need,
coupled with the performance benefits of implementing this internally
prompts my interest in two options.

- Add a fifth parameter to htmlspecialchars() that takes an array of
whitelisted sequences. Even though this seems like a terribly long
function to call, one could easily wrap the call in a facade function.

- Add a new function called str_escape(), but this introduces
potential BC issues.

There are of course other options (e.g., integrate this as an
additional filter, etc.)

I've built an extension that, while focused on an old web framework of
mine, contains a function that can serve as a proof-of-concept that
implements the functionality I've outlined above (see
nephtali_str_escape_html):

https://github.com/AdamJonR/nephtali-php-ext/blob/master/nephtali.c

I've tossed out the idea on this list before, but it was only
tangentially related to the discussion at the time. At this point, I'd
really like to focus on this idea directly to see what approach might
seem wisest (including doing nothing, if the frequency of use does not
justify bringing the functionality into the core.)

Thoughts?

Adam


Thread (5 messages)

« previous php.internals (#64540) next »