On Jan 19, 2014 11:48 PM, "Yasuo Ohgaki" <[email protected]> wrote:
>
> Hi Pierre,
>
> On Mon, Jan 20, 2014 at 1:09 AM, Pierre Joye <[email protected]> wrote:
>>
>> On Thu, Jan 16, 2014 at 11:47 PM, Yasuo Ohgaki <[email protected]>
wrote:
>> > Hi Nikita,
>> >
>> > On Fri, Jan 17, 2014 at 7:38 AM, Nikita Popov <[email protected]>
wrote:
>> >
>> >> No, I don't want a locale-based approach. I want the string functions
to
>> >> stay as is. Multibyte variants of the functions can be added to the
>> >> multibyte extension.
>> >
>> >
>> > Creating mb_*() function would not solve security issues of
>> > multibyte char handling since multibyte aware functions are
>> > optional feature.
>>
>> We never supported nor claimed that these functions are multi bytes
>> safe. However I actually fully understand that we should solve this
>> problem, in one way or another.
>>
>> > However, it may work if PHP compiles mbstring by default and
>> > discourage use of addslashes()/var_export()/stripslashes()
>> > in favor of mb_*() variants.
>>
>> I do not think we should discourage the use of these functions but
>> clearly document to rely on mb_* APIs as long as multi bytes support
>> is required.
>>
>> I join other about not making any optional arguments in the existing
>> APIs, for a couple of reasons:
>>
>> 1. it does not solve anything as people still have to update their
>> code, and they won't unless maybe if they read the doc/changelog
>> 2. It is really not a clean solution
>> 3. we already have many duplicate functions in mb, it has worked well
>> so far and we can add the ones discussed here
>
>
> I'll leave existing ext/standard functions alone.
:)
>> The last question was about relying on locale. This is absolutely not
>> a solution. Locale has been proven to be totally unreliable, buggy and
>> unsafe. Let alone the total lack of real posix locale support on
>> Windows.
>
>
> mb_escape_shell_arg()/mb_escape_shell_cmd() need locale based
> solution, since there aren't good way to detect terminal encoding. I'll
> make mb version explicitly overrides this behavior by explicitly
specifying
> encoding.
>
Sounds good
> On UNIXes, UTF-8 encoding is popular terminal encoding, but there
> would be systems using other encoding such as EUC, or even SJIS, BIG5.
Right, and as far as I remember UTF-8 does not suffer from this problem.
> Windows uses different encoding for terminal encoding according to
locale,
> so it's much more complex.
>
Let me provide a function to detect it, but we need something to normalize
the names. Do we have such thing in mbstring?
> This is the reason why I would use locale. However, this implementation
> is debatable.
>
Yes :)
> We could say "Users should explicitly specify terminal encoding
> by themselves". In fact, I prefer this even if I am about to implement
> mb_escape_shell_*() using locale for automatic encoding detection.
>
> It may be better to raise E_NOTICE at least if encoding parameter is
> omitted for mb_escape_shell_*().
Notice sounds good too.
>
>> For anything related to locale, formats or encoding, we should rely on
>> intl (ICU) and not on systems's locale. This is the only way to be
>> portable, safe and updated.
>
>
> I agree.
> I also would like to propose
>
> https://wiki.php.net/rfc/altmbstring - ICU
> version of mbstring
>
Oh, very nice.
> for future release. Most work has done by Moriyoshi. We may try to
> switch to it now, but I suppose there is not enough time for 5.6.
What's the status? We still have some time :)
Cheers,
Pierre