Re: [VOTE] RFC: Multibyte Char Handling

From: Date: Mon, 27 Jan 2014 00:09:44 +0000
Subject: Re: [VOTE] RFC: Multibyte Char Handling
References: 1 2 3 4 5  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi Dan,

On Mon, Jan 27, 2014 at 7:28 AM, Dan Ackroyd <[email protected]> wrote:

> Sorry, for rapid posting but I realise I should have been explicit in
> my last message. The RFC being voted on has "Backward Incompatible
> Changes: None."
>
> The referenced RFC which apparently would be included has:
>
> "Removed (deprecated) functions and reasons behind it
>
> mb_check_encoding() – Not that usable as it is advertised, period..
> First of all, validation in terms of encoding is just as same as
> filtering through the converter supplied with the same value for the
> input and output encoding. Thus just use mb_convert_encoding().
> mb_convert_case() – Use mb_strtoupper(), mb_strtolower() and
> mb_strtotitle()
> mb_convert_kana() – This can't be standard-compliant. In addition,
> part of the functionality is already covered by Normalizer of intl
> extension, so we need to carefully consider what is actually needed
> here again.
> mb_convert_variables() – This can be implemented as a script.
> mb_decode_mimeheader() and mb_encode_mimeheader() – Non-standard
> compliancy.
> mb_decode_numericentity() – Removed in favor of html_entity_decode().
> mb_encode_numericentity() – Removed in favor of htmlentities() and
> htmlspecialchars().
> mb_encoding_aliases() – Just unnecessary.
> mb_ereg_match() – Use mb_ereg()
> mb_ereg_search(), mb_ereg_search_getpos(), mb_ereg_search_getregs(),
> mb_ereg_search_init(), mb_ereg_search_pos(), mb_ereg_search_regs() and
> mb_ereg_search_setpos() – I rarely heard a script that actively uses
> these functions. They involve an internal state that is not visible to
> users, and thus it most likely causes confusion when used across the
> function calls. Need to be reimplemented as a class.
> mb_eregi() – Use mb_regex_options() and mb_ereg()
> mb_eregi_replace() – I wonder why this function was added in the first
> place because giving 'i' option to mb_ereg_replace() works in the same
> way.
> mb_detect_order(), mb_get_info(), mb_http_input(), mb_http_output(),
> mb_language() and mb_substitute_character() – ini_set() and ini_get()
> are your friends, I guess…
> mb_regex_encoding() – It is really confusing that the current mbstring
> allows two different encoding defaults for regex functions and the
> rest. Those settings are unified in the alternative version and so
> this is no longer necessary.
> mb_send_mail() – The behavior of this function relies on the
> pseudo-locale setting called “mbstring.language” that supports just a
> limited set of possible locales. As not everyone can benefit from the
> function and most significant applications implement their own mail
> functions, I suppose this is no longer wanted.
> mb_strrchr() – Use mb_strrpos().
> mb_strrichr() – Use mb_strripos()."
>
> None is not the same as a huge number of function changes.
>

I just didn't want to touch 5 year old RFC.
As I wrote in parent RFC, the implementation is subject to be changed.

The objective of this RFC is killing the vulnerability completely.
It's better to have road map for it.

As I wrote, there is license difficulty to compile current mbstring by
default.
There is mbstring-ng, but it's incomplete. This RFC is only proposing
feasible option.

I'm going to copy all mbstring features to mbstring-ng, but there may be
some compatibility issue. e.g. Non character encoding handling.

There will be another vote when we replace mbstring and mbstring-ng
actually, since this RFC only proposes the way to go. I don't think
this RFC is the approval for replacement.

Regards,

--
Yasuo Ohgaki
[email protected]


Thread (20 messages)

« previous php.internals (#71618) next »