Re: Unicode strings?

From: Lester Caine Date: Tue, 11 Mar 2014 12:27:50 +0000

Subject: Re: Unicode strings?

References: 1 2 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Crypto Compress wrote:
I'm slowly working through a long list of things relating to unicode strings
trying to work out just where the main problems are.

The very first problem I hit is ICU's limitation to 32bit string lengths. How
does the switch to 64bit string length on 64 bit platforms impinge on this.
While I can see the advantage of this particular change, would that also now
require our own version of ICU capable of also handling longer strings? This
probably falls out in the wash of my next point ...

Where have you found this information? Can you please provide source for this?

This information has been published in several places on the list and in the wiki already ...
http://userguide.icu-project.org/strings/utf-8 for the ICU, and the RFC's here for 64 bit improvements to PHP ...

Currently strings are simply strings? I'm sure we have already had this
discussion, and it will be necessary to switch from simple strings to a string
object which can handle the intricacies of unicode?

Yes, currently we have so called binary strings (simple bytes, 8 bits).
No, we should not create an string-object to handle all intricacies of unicode.

How do you provide a holder for the various additional items required for a unicode 'object'? While I can see one would get away with calling functions all the time on a single string object, having calculated different versions of the same string or complex character counts, they need to be cached so they can be used again? Or does one maintain each answer in different variables?

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Thread (28 messages)

Lester CaineTue, 11 Mar 2014 10:31:30 +0000
Crypto CompressTue, 11 Mar 2014 11:06:18 +0000
Lester CaineTue, 11 Mar 2014 12:27:50 +0000
Andrea FauldsTue, 11 Mar 2014 17:43:13 +0000
Crypto CompressWed, 12 Mar 2014 09:49:57 +0000
Lester CaineWed, 12 Mar 2014 10:16:01 +0000
Crypto CompressWed, 12 Mar 2014 10:27:19 +0000
Crypto CompressWed, 12 Mar 2014 10:33:24 +0000
Pierre JoyeWed, 12 Mar 2014 10:54:33 +0000
Crypto CompressWed, 12 Mar 2014 11:14:18 +0000
Lester CaineWed, 12 Mar 2014 11:49:15 +0000
Crypto CompressWed, 12 Mar 2014 12:00:43 +0000
Lester CaineWed, 12 Mar 2014 12:20:42 +0000
Crypto CompressWed, 12 Mar 2014 12:41:23 +0000
Yasuo OhgakiThu, 13 Mar 2014 01:10:11 +0000
Rasmus LerdorfThu, 13 Mar 2014 00:01:24 +0000
Crypto CompressThu, 13 Mar 2014 01:22:31 +0000
Yasuo OhgakiThu, 13 Mar 2014 01:53:36 +0000
Crypto CompressThu, 13 Mar 2014 08:33:08 +0000
Lester CaineThu, 13 Mar 2014 09:18:06 +0000
Crypto CompressThu, 13 Mar 2014 11:28:41 +0000
Stas MalyshevThu, 13 Mar 2014 19:59:41 +0000
Yasuo OhgakiThu, 13 Mar 2014 20:32:39 +0000
Andrea FauldsThu, 13 Mar 2014 20:36:41 +0000
Yasuo OhgakiThu, 13 Mar 2014 20:45:21 +0000
Lester CaineThu, 13 Mar 2014 09:06:11 +0000
Pierre JoyeTue, 11 Mar 2014 13:12:48 +0000
Yasuo OhgakiTue, 11 Mar 2014 21:55:04 +0000

« previous	php.internals (#73063)	next »

From:	Lester Caine	Date:	Tue, 11 Mar 2014 12:27:50 +0000
Subject:	Re: Unicode strings?
References:	1 2	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message