Re: Re: [php6] Unicode support, options?
Pierre Joye wrote:
That what ever is used will need to be both tailored for PHP and transparent
as far as ICU is concerned is as you have identified - a given. ICU is still
built using 32bit string lengths ( I think? ) which does add to the fun, but
I don't see any reason not to be using functions like compareUTF8() and
ucasemap_utf8ToLower() from ICU in which case the strings need to be
standard ICU UTF-8 strings? I can see the advantage of the 'fast' compare
that I have been banging on about elsewhere, which looks for a simple match
between two raw strings of bytes. UTF-8 only comes into that when you need
to add 'rank'? But much of the core processing CAN simply ignore that as
long as the generic calls don't have dead tails which activate it?
We may use our own functions (or other lib) to covers operations not
implemented in ICU or too slow because of the conversions. That's why
investigating in other tools is still a good thing to do.
The bit I'm still missing here is 'operations not implemented in ICU'?
As soon as conversions are required then speed is always going to be compromised, but where the platform is already UTF-8 based, which is a growing situation, then all we are looking for is to handle UTF-8 strings quickly. For the best performance conversions can simply be avoided. So I'm currently looking at conversion as a secondary problem - probably less important than case! - and just trying to identify what is missing from ICU's UTF-8 that needs to be added?
It may well be that windows is a special case that needs it's own conversion layer, but that should not form part of any core upgrade. It is not needed for many installations?
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Thread (34 messages)