Hi Johannes,
On Tue, Feb 25, 2014 at 8:39 PM, Johannes Schlüter
<[email protected]>wrote:
> On Tue, 2014-02-25 at 13:01 +0200, Stas Malyshev wrote:
> > Hi!
> >
> > > Engine supports SJIS/other encoding script execution. Therefore,
> >
> > I'm not sure what you mean by "other encoding script execution". Engine
> > execution is the same regardless of the data you use, nowhere in the
> > opcodes there's any encoding that is important.
>
> I think Yasuo refers to
> php -d zend.script_encoding=SJIS
>
Yes, it is.
There are users that uses SJIS and SJIS like encoding for PHP scripts. This
is nice feature for Windows users in east Asia.
> > addslashes()/var_export() behavior is security vulnerability just like
> > > encoding based SQL/JavaScript injections.
> >
> > I don't see how it is "therefore". If there's SQL injection or JS
> > injection that is not the result of wrong code, then let's outline them
> > and consider them. Specifically for var_export, which behavior is
> > broken? I see no mention of the specific examples in the RFC.
>
> i think Yasuo assumes that results from those operations with a
> script_encoding setting would be handled "correctly".
>
> I don't think we can do that, though. zend.script_encoding is a hardly
> documented feature which should be used with care.
>
> The documentation of addslashes() refers to "characters". I don't think
> the behavior should depend on PHP settings but like all "basic"
> functions assume on ASCII compatible single byte strings. Adding magic
> there makes it more confusing and harder to use. As said in a previous
> discussion on this topic rather than using addslashes users should use
> context-aware escaping functions. Using addslashes is almost always a
> bad idea.
>
I think many users realize this thanks to magic_quotes_gpc ;) Therefore,
many users know there would be issue with addslashes(), but not for
var_export(), I suppose.
The situation around var_export() is a bit more complicated.
> var_export() is used to create application configuration, cache data
> etc. so one might expect the PHP which created that to be able to read
> that, again. Doing this isn't easy, though, as it makes the generated
> file non-portable.
>
It would be portable as long as user uses correct encoding. The default
would be UTF-8 for 5.6 and ASCII for others in case if we modify
addslashes()/var_export(). (If mb_*(), internal_encoding setting is used)
Users who are not affected by this issue can use the default.
For affected users, all we have to do is ask users to specify proper
encoding so that php_addslashes() would not break structure of var_export()
data for certain encoding.
As you know, all databases' escaping functions have encoding parameter.
PostgreSQL uses encoding parameter stored in db connection structure. This
is the reason why pg_escape_string() has optional database base connection
parameter for escaping.
Adding "magic" into probably security relevant features is problematic
> and unless the engine is truly encoding-aware I'd abstain from such
> changes.
I agree. For example, pgsq/mysql/mysqli module does some magic by using and
assuming default connection's encoding parameter may be used to escape
string properly. If users are using multiple connections with different
encoding, it may cause problem. It wouldn't cause problem almost always
because chances are rare that users are using multiple encoding at once,
though.
Anyway, if we add mb_*() or modify existing functions, user has to specify
correct encoding when it differs from default encoding ("default_charset"
for 5.6 and up, mb_internal_encoding for less)
Although there would be code duplication, adding mb_*() seems to be the
best choice. Duplication may be removed with a little changes in API. This
cannot happen in released versions, but it is possible for 5.6. I should
write this in the RFC.
If this is fixed in all released versions, it's nicer for users and good
for us, but I'm not against fix this issue only in 5.6 and warn affected
users hoping nobody will be attacked.
Regards,
--
Yasuo Ohgaki
[email protected]