Hi Mike,
On Sat, Jan 18, 2014 at 1:37 AM, Mike <[email protected]> wrote:
> On Fri, 2014-01-17 at 07:34 +0900, Yasuo Ohgaki wrote:
>
> > This discussion circulate discussion.
>
> I'm also in favor of putting those things in ext/mbstring.
>
I'll make this a vote option.
> >
> > At first, I proposed locale based solution using php_mblen().
> > This approach does not require additional encoding parameter
> > since encoding is specified by locale.
>
> Meh, but would be okay for me.
>
It's feasible solution for older versions.
I would like to remove locale based code for future releases, though.
Functions that are using php_mblen() could be modified to use mbstring
when PHP is built with mbstring. Functions may use internal_encoding.
Use of internal_encoding requires user code modification in some cases.
For instance, Japanese Windows command line uses Shift_JIS as
terminal encoding while many users uses UTF-8 for script. Users has to
add code that changes internal_encoding. e.g.
escapeshellarg()/escapeshellcmd().
They could use simple wrapper for escapeshellarg()/escapeshellcmd(), though..
Although users have to modify their code a little, fgetcsv() and like would
be more usable because it's more reliable than locale.
It may be better to add mb version of these functions and deprecate them
like addslashes(), if we are not going to modify these functions.
>
> > However, some people don't like the solution (in security ML)
> > because it is locale based solution. It may have unwanted side
> > effects. Locale is unreliable and most user just don't care about it.
> >
> > Therefore, I proposed this approach that introduce encoding
> > parameter just like htmlspecialchars()/htmlentities().
> >
> > Encoding parameter (or some way to specify encoding) for security
> > related string function is mandatory. We should provide some way
> > to specify encoding.
> >
>
> Many of us do not have access to that mailing list, so yould you shed
> some light on the acutal issue?
>
There are 2 classes of security issue in php_addslashes()
First is PHP script execution.
Suppose this is a script save app config script.
<?php
$v = addslashes('表') . addslashes('\'; exec(\'rm -rf /\');
die();\''));
file_put_contents('myconfig.php', '<?php $config='.$v);
?>
then read it as PHP script.
<?php
include 'myconfig.php';
// other code follows
?>
If '表' is SJIS, the char code is 0x955c (0x5c = \). Since addslashes() is
not multibyte aware, it escapes
the char as 0x95, 0x5c, 0x5c. This make possible that break out string
quoting and write attack code.
The contents of myconfig.php became
<?php $config= '表'; exec('rm -rf /'); die()
with SJIS, BIG5 and other similar encoding.
var_export() can be attacked by the same reason and method.
This attack method is well known for attackers around East Asia region,
but it's not limited to East Asia.
Second is rather obvious. It's a DoS.
Since Zend engine raise compile error for invalid encoding, data generated
by addslashes()/var_export() could stop script execution.
For instance,
http://lxr.php.net/xref/PHP_5_5/Zend/zend_language_scanner.l#507
Although it would be rare in real code, php_stripslashes() could be
problematic.
Since some chars have special byte in SJIS like encoding, it could be used
to remove escape character and cause problem. If stripped string is
evaluated
as PHP script, it may cause issues like php_addslashes().
Anyway, there isn't a feasible solution that could satisfy all. AFAIK.
Many users don't have to update their code, but some users may have to
modify
their code according to their usage.
Regards,
--
Yasuo Ohgaki
[email protected]