Hi Lester,
On Sat, Jan 18, 2014 at 10:41 PM, Lester Caine <[email protected]> wrote:
> Multibyte characters are still a contentious area, and the current
> compromise of supporting multibyte content, but being essentially 'single
> byte' for the programming structure as been a solution adopted in a few
> projects. Firebird is once again debating the same point that they and PHP
> last discussed 10 years ago, and was too difficult so PHP6 floundered and
> Firebird remained essentially single byte strings in the metadata.
>
Making a product only works for single byte char is completely OK.
The issue is there is no proper function/method/feature that escapes PHP
string with multibyte chars correctly. PHP needs to provide API that
handles data properly/safely.
It's awful that reading var_export()ed data could execute arbitrarily PHP
script and/or terminate script execution, isn't it? It cannot be ignored.
10 years on isn't it time to re-open the debate on making the core unicode
> since 32 bit processors are more likely to be the norm these days.
> Certainly if everything internal is UTF8, then all of the encoding problems
> are moved to the client interface?
>
I'm not proposing transition like Python 2.x to 3.x. The RFC is proposing
required feature for proper/safe coding. Anyway, it seems Python's approach
is not working well. We could learn from it.
Server side should never expect clients are sending proper data, therefore
proper encoding handling is mandatory on server side. Adoption of UTF-8
makes things easier, but there are ways to exploit UTF-8 encoding also. For
example, recent Chrome may display blank page with malformed chars and it
could be used for DoS attack, mixing systems that validate and un-validate
encoding could be vulnerable DoS. New mb functions handle encoding
properly not only SJIS like encoding but also any encoding supported by
mbstiring.
Did I make typo? My Chrome did not report spell error. I appreciate if you
point it out.
I sent correct URL right after first mail, but it wouldn't work. I also
would not check second mail in long thread :(
https://wiki.php.net/rfc/multibyte_char_handling
Regards,
--
Yasuo Ohgaki
[email protected]