Hi Pierre,
On Mon, Jan 20, 2014 at 1:09 AM, Pierre Joye <[email protected]> wrote:
> On Thu, Jan 16, 2014 at 11:47 PM, Yasuo Ohgaki <[email protected]> wrote:
> > Hi Nikita,
> >
> > On Fri, Jan 17, 2014 at 7:38 AM, Nikita Popov <[email protected]>
> wrote:
> >
> >> No, I don't want a locale-based approach. I want the string functions to
> >> stay as is. Multibyte variants of the functions can be added to the
> >> multibyte extension.
> >
> >
> > Creating mb_*() function would not solve security issues of
> > multibyte char handling since multibyte aware functions are
> > optional feature.
>
> We never supported nor claimed that these functions are multi bytes
> safe. However I actually fully understand that we should solve this
> problem, in one way or another.
>
> > However, it may work if PHP compiles mbstring by default and
> > discourage use of addslashes()/var_export()/stripslashes()
> > in favor of mb_*() variants.
>
> I do not think we should discourage the use of these functions but
> clearly document to rely on mb_* APIs as long as multi bytes support
> is required.
>
> I join other about not making any optional arguments in the existing
> APIs, for a couple of reasons:
>
> 1. it does not solve anything as people still have to update their
> code, and they won't unless maybe if they read the doc/changelog
> 2. It is really not a clean solution
> 3. we already have many duplicate functions in mb, it has worked well
> so far and we can add the ones discussed here
>
I'll leave existing ext/standard functions alone.
The last question was about relying on locale. This is absolutely not
> a solution. Locale has been proven to be totally unreliable, buggy and
> unsafe. Let alone the total lack of real posix locale support on
> Windows.
>
mb_escape_shell_arg()/mb_escape_shell_cmd() need locale based
solution, since there aren't good way to detect terminal encoding. I'll
make mb version explicitly overrides this behavior by explicitly specifying
encoding.
On UNIXes, UTF-8 encoding is popular terminal encoding, but there
would be systems using other encoding such as EUC, or even SJIS, BIG5.
Windows uses different encoding for terminal encoding according to locale,
so it's much more complex.
This is the reason why I would use locale. However, this implementation
is debatable.
We could say "Users should explicitly specify terminal encoding
by themselves". In fact, I prefer this even if I am about to implement
mb_escape_shell_*() using locale for automatic encoding detection.
It may be better to raise E_NOTICE at least if encoding parameter is
omitted for mb_escape_shell_*().
For anything related to locale, formats or encoding, we should rely on
> intl (ICU) and not on systems's locale. This is the only way to be
> portable, safe and updated.
I agree.
I also would like to propose
https://wiki.php.net/rfc/altmbstring - ICU
version of mbstring
for future release. Most work has done by Moriyoshi. We may try to
switch to it now, but I suppose there is not enough time for 5.6.
It's supposed to work the same as current mbstring mostly. It may be
better mbstring compile as optional in favor of ICU implementation.
Regards,
--
Yasuo Ohgaki
[email protected]