Re: Re: [php6] Unicode support, options?

From: Lester Caine Date: Thu, 27 Feb 2014 10:51:50 +0000

Subject: Re: Re: [php6] Unicode support, options?

References: 1 2 3 4 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Pierre Joye wrote:
That what ever is used will need to be both tailored for PHP and transparent
as far as ICU is concerned is as you have identified - a given. ICU is still
built using 32bit string lengths ( I think? ) which does add to the fun, but
I don't see any reason not to be using functions like compareUTF8() and
ucasemap_utf8ToLower() from ICU in which case the strings need to be
standard ICU UTF-8 strings? I can see the advantage of the 'fast' compare
that I have been banging on about elsewhere, which looks for a simple match
between two raw strings of bytes. UTF-8 only comes into that when you need
to add 'rank'? But much of the core processing CAN simply ignore that as
long as the generic calls don't have dead tails which activate it?

We may use our own functions (or other lib) to covers operations not
implemented in ICU or too slow because of the conversions. That's why
investigating in other tools is still a good thing to do.

The bit I'm still missing here is 'operations not implemented in ICU'?
As soon as conversions are required then speed is always going to be compromised, but where the platform is already UTF-8 based, which is a growing situation, then all we are looking for is to handle UTF-8 strings quickly. For the best performance conversions can simply be avoided. So I'm currently looking at conversion as a secondary problem - probably less important than case! - and just trying to identify what is missing from ICU's UTF-8 that needs to be added?

It may well be that windows is a special case that needs it's own conversion layer, but that should not form part of any core upgrade. It is not needed for many installations?

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Thread (34 messages)

Pierre JoyeThu, 20 Feb 2014 05:54:21 +0000
Crypto CompressThu, 20 Feb 2014 15:04:34 +0000
Pierre JoyeThu, 20 Feb 2014 15:44:10 +0000
Ivan Enderlin @ HoaThu, 20 Feb 2014 15:48:29 +0000
Pierre JoyeThu, 20 Feb 2014 15:53:53 +0000
Ivan Enderlin @ HoaThu, 20 Feb 2014 15:55:28 +0000
Andrey HristovThu, 20 Feb 2014 15:56:49 +0000
Johannes SchlüterThu, 20 Feb 2014 16:25:44 +0000
Crypto CompressThu, 20 Feb 2014 21:04:41 +0000
Pierre JoyeFri, 21 Feb 2014 02:58:59 +0000
Lester CaineFri, 21 Feb 2014 12:04:09 +0000
Pierre JoyeFri, 21 Feb 2014 12:30:14 +0000
Lester CaineFri, 21 Feb 2014 13:28:44 +0000
Ivan Enderlin @ HoaThu, 20 Feb 2014 16:10:25 +0000
Marc BennewitzFri, 21 Feb 2014 19:49:08 +0000
Pierre JoyeThu, 27 Feb 2014 06:13:38 +0000Re: [php6] Unicode support, options?
Lester CaineThu, 27 Feb 2014 09:57:12 +0000Re: Re: [php6] Unicode support, options?
Pierre JoyeThu, 27 Feb 2014 10:28:38 +0000
Lester CaineThu, 27 Feb 2014 10:51:50 +0000
Pierre JoyeThu, 27 Feb 2014 11:05:32 +0000
Lester CaineThu, 27 Feb 2014 11:32:52 +0000
Crypto CompressThu, 13 Mar 2014 11:28:51 +0000
Yasuo OhgakiThu, 13 Mar 2014 23:07:34 +0000
Crypto CompressFri, 14 Mar 2014 07:49:00 +0000
Yasuo OhgakiFri, 14 Mar 2014 08:31:13 +0000
Pierre JoyeFri, 14 Mar 2014 08:52:09 +0000
Crypto CompressFri, 14 Mar 2014 09:19:18 +0000
Yasuo OhgakiFri, 14 Mar 2014 09:53:04 +0000
Yasuo OhgakiFri, 14 Mar 2014 10:21:34 +0000
Lester CaineFri, 14 Mar 2014 10:46:38 +0000
Nikita PopovFri, 14 Mar 2014 11:20:02 +0000
Alexey ZakhlestinFri, 14 Mar 2014 11:33:02 +0000
Yasuo OhgakiFri, 14 Mar 2014 22:11:20 +0000
Yasuo OhgakiFri, 14 Mar 2014 22:04:29 +0000

« previous	php.internals (#72839)	next »

From:	Lester Caine	Date:	Thu, 27 Feb 2014 10:51:50 +0000
Subject:	Re: Re: [php6] Unicode support, options?
References:	1 2 3 4	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message