Re: [php6] Unicode support, options?

From: Lester Caine Date: Fri, 21 Feb 2014 13:28:44 +0000

Subject: Re: [php6] Unicode support, options?

References: 1 2 3 4 5 6 7 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Pierre Joye wrote:
On Fri, Feb 21, 2014 at 1:04 PM, Lester Caine <[email protected]> wrote:
Pierre Joye wrote:

What do you understand by "storage"?

To have string stored as UTF-8 only, no conversion required for 99% of our
use.


I think that the first thing that needs to be agreed on is if there will be
support for UTF-8 in the core? As has already been said, in many places this
currently just works and so blocking that may be more of a problem now? The
question surly is "What is the 1% that needs some extra work?"

I think we pretty much agree already that we need UTF-8 as the base,
meaning are stored in UTF-8. Conversions may be needed for advanced
usages provided by ICU (or maybe not, I just do not know for sure
now).

I light library would be most appropriate for filling the gaps currently
created by use of UTF-8 strings in the core? It is not until one starts
adding the mbstring level of string processing that a more powerful library
is required. Something that simply ensures UTF-8 strings are valid and can
carry out comparisons as required?

it is more than only comparison. If only comparison, additions and the
likes, utf8proc is enough, or librope with some additions.
Only thing putting me off utf8proc is that it only supports Unicode 5.0.0
librope does not seem to understand any of the fine detail of the uncode standards? What I've been looking for is the case switch actions and currently all I can find is ICU to handle that?

The black hole is still 'case sensitivity' and it is perhaps laying down a
'light' set of rules for this which would allow a path forward? As I have
indicated, I'd prefer simply dropping case insensitivity, but a compromise
might be to retain it where a string length does not change, and a clean
reverse transform exists? So a library that provides that comparison as part
of the core package?

I do not care much about languages support for UTF-8 names for
methods, functons, variables etc. My take on it is that we should
stick to ASCII for it and be done with that. But that's only my
opinion :)
While I have no intention of using more than ASCII myself I can see the argument for supporting use of more user friendly names for functions and the like. I see the complaints about our current 'English' names and how they need improving while at the same time I am dealing with customer sites where we provide simple aliases for all text in a local translation. Easy enough in a relational database where you simply select the right set of entries from a table, but not so easy for PHP ...

We may end writing our own library for the core operations... But I
would prefer to avoid that as it is really not a trivial task.
Totally agree ... but I don't see a good path yet?
While ICU creates it's own complications, using ready bundled versions, it is by far the cleanest code for both UTF-8 and actually UTF-32 if one simply ditches all the UTF-16 mess. I'd much rather start from that code than any of the other libraries so far identified. In any case I don't see any option for the conversion process to and from UTF-8?

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Thread (34 messages)

Pierre JoyeThu, 20 Feb 2014 05:54:21 +0000
Crypto CompressThu, 20 Feb 2014 15:04:34 +0000
Pierre JoyeThu, 20 Feb 2014 15:44:10 +0000
Ivan Enderlin @ HoaThu, 20 Feb 2014 15:48:29 +0000
Pierre JoyeThu, 20 Feb 2014 15:53:53 +0000
Ivan Enderlin @ HoaThu, 20 Feb 2014 15:55:28 +0000
Andrey HristovThu, 20 Feb 2014 15:56:49 +0000
Johannes SchlüterThu, 20 Feb 2014 16:25:44 +0000
Crypto CompressThu, 20 Feb 2014 21:04:41 +0000
Pierre JoyeFri, 21 Feb 2014 02:58:59 +0000
Lester CaineFri, 21 Feb 2014 12:04:09 +0000
Pierre JoyeFri, 21 Feb 2014 12:30:14 +0000
Lester CaineFri, 21 Feb 2014 13:28:44 +0000
Ivan Enderlin @ HoaThu, 20 Feb 2014 16:10:25 +0000
Marc BennewitzFri, 21 Feb 2014 19:49:08 +0000
Pierre JoyeThu, 27 Feb 2014 06:13:38 +0000Re: [php6] Unicode support, options?
Lester CaineThu, 27 Feb 2014 09:57:12 +0000Re: Re: [php6] Unicode support, options?
Pierre JoyeThu, 27 Feb 2014 10:28:38 +0000
Lester CaineThu, 27 Feb 2014 10:51:50 +0000
Pierre JoyeThu, 27 Feb 2014 11:05:32 +0000
Lester CaineThu, 27 Feb 2014 11:32:52 +0000
Crypto CompressThu, 13 Mar 2014 11:28:51 +0000
Yasuo OhgakiThu, 13 Mar 2014 23:07:34 +0000
Crypto CompressFri, 14 Mar 2014 07:49:00 +0000
Yasuo OhgakiFri, 14 Mar 2014 08:31:13 +0000
Pierre JoyeFri, 14 Mar 2014 08:52:09 +0000
Crypto CompressFri, 14 Mar 2014 09:19:18 +0000
Yasuo OhgakiFri, 14 Mar 2014 09:53:04 +0000
Yasuo OhgakiFri, 14 Mar 2014 10:21:34 +0000
Lester CaineFri, 14 Mar 2014 10:46:38 +0000
Nikita PopovFri, 14 Mar 2014 11:20:02 +0000
Alexey ZakhlestinFri, 14 Mar 2014 11:33:02 +0000
Yasuo OhgakiFri, 14 Mar 2014 22:11:20 +0000
Yasuo OhgakiFri, 14 Mar 2014 22:04:29 +0000

« previous	php.internals (#72742)	next »

From:	Lester Caine	Date:	Fri, 21 Feb 2014 13:28:44 +0000
Subject:	Re: [php6] Unicode support, options?
References:	1 2 3 4 5 6 7	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message