Re: [php6] Unicode support, options?

From: Date: Fri, 21 Feb 2014 13:28:44 +0000
Subject: Re: [php6] Unicode support, options?
References: 1 2 3 4 5 6 7  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Pierre Joye wrote:
On Fri, Feb 21, 2014 at 1:04 PM, Lester Caine <[email protected]> wrote:
Pierre Joye wrote:
What do you understand by "storage"?
To have string stored as UTF-8 only, no conversion required for 99% of our use.
I think that the first thing that needs to be agreed on is if there will be support for UTF-8 in the core? As has already been said, in many places this currently just works and so blocking that may be more of a problem now? The question surly is "What is the 1% that needs some extra work?"
I think we pretty much agree already that we need UTF-8 as the base, meaning are stored in UTF-8. Conversions may be needed for advanced usages provided by ICU (or maybe not, I just do not know for sure now).
I light library would be most appropriate for filling the gaps currently created by use of UTF-8 strings in the core? It is not until one starts adding the mbstring level of string processing that a more powerful library is required. Something that simply ensures UTF-8 strings are valid and can carry out comparisons as required?
it is more than only comparison. If only comparison, additions and the likes, utf8proc is enough, or librope with some additions. Only thing putting me off utf8proc is that it only supports Unicode 5.0.0
librope does not seem to understand any of the fine detail of the uncode standards? What I've been looking for is the case switch actions and currently all I can find is ICU to handle that?
The black hole is still 'case sensitivity' and it is perhaps laying down a 'light' set of rules for this which would allow a path forward? As I have indicated, I'd prefer simply dropping case insensitivity, but a compromise might be to retain it where a string length does not change, and a clean reverse transform exists? So a library that provides that comparison as part of the core package?
I do not care much about languages support for UTF-8 names for methods, functons, variables etc. My take on it is that we should stick to ASCII for it and be done with that. But that's only my opinion :) While I have no intention of using more than ASCII myself I can see the argument for supporting use of more user friendly names for functions and the like. I see the complaints about our current 'English' names and how they need improving while at the same time I am dealing with customer sites where we provide simple aliases for all text in a local translation. Easy enough in a relational database where you simply select the right set of entries from a table, but not so easy for PHP ...
We may end writing our own library for the core operations... But I would prefer to avoid that as it is really not a trivial task. Totally agree ... but I don't see a good path yet?
While ICU creates it's own complications, using ready bundled versions, it is by far the cleanest code for both UTF-8 and actually UTF-32 if one simply ditches all the UTF-16 mess. I'd much rather start from that code than any of the other libraries so far identified. In any case I don't see any option for the conversion process to and from UTF-8? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Thread (34 messages)

« previous php.internals (#72742) next »