Re: Unicode strings?

From: Date: Wed, 12 Mar 2014 10:16:01 +0000
Subject: Re: Unicode strings?
References: 1 2 3 4  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Crypto Compress wrote:
The very first problem I hit is ICU's limitation to 32bit string lengths. How does the switch to 64bit string length on 64 bit platforms impinge on this. While I can see the advantage of this particular change, would that also now require our own version of ICU capable of also handling longer strings? This probably falls out in the wash of my next point ...
Where have you found this information? Can you please provide source for this?
This information has been published in several places on the list and in the wiki already ... http://userguide.icu-project.org/strings/utf-8 for the ICU, and the RFC's here for 64 bit improvements to PHP ...
Quote #1: "You can request 64 or 32 bits with the --with-library-bits= option, ..." Quote #2: "Strings are represented as UChar * as the base string type." http://userguide.icu-project.org/icufaq#TOC-How-do-I-get-32--or-64-bit-versions-of-the-ICU-libraries- String length is platform dependent.
It is not only PHP that has hidden gems of information buried in the documentation, but ... "For UTF-8 strings, ICU normally uses (const) char * pointers and int32_t lengths" The question here is how UTF-8 default works in ICU as we want to actually avoid using UChar altogether using UText instead - I think? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Thread (28 messages)

« previous php.internals (#73074) next »