Re: PHP6 wiki page

From: Date: Fri, 14 Feb 2014 21:06:50 +0000
Subject: Re: PHP6 wiki page
References: 1 2 3 4 5  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Rasmus Lerdorf wrote:
What we really need is an awesome small and fast Unicode library that does everything ICU does but faster and in less code while using UTF-8 as its internal storage so we don't have to convert on each and every operation. There are a ton of non-obvious things beyond simple string manipulation. String collation alone is massively complicated, for example.
Surely the bottom line is that to cover every fine detail, ICU has to be used as the smaller libraries tend to make few assumptions to make life easy? But my point was that most of the time you only need the simple stuff? Simply using UTF8 strings in place of the byte based ones in all of the relevant string? Remove the need to 'lowercase' by dropping case-insensitivity and things are simplified somewhat? I've found the comment I was looking for finally while searching around ... "UTF-8 is specially designed so that many byte-oriented string functions continue to work or only need minor modifications." This is why people can put unicode characters in many places in PHP now without it actually breaking? I've seen a few comments about switching to C++ and http://utfcpp.sourceforge.net/ caught my eye, but http://www.public-software-group.org/utf8proc-documentation came to light when I started looking at NDF/NDC but I've been looking for a suitable unicode string handler for doing substring clipping and all of that. I AM right in thinking that mbstring is basically overkill if everything being worked with has already been converted to UTF8? While I was aware of accent code points, I'd not quite appreciated how complicated they can get. Up until now I've just been looking at text cut and pasted from UTM8 messages. If one simply ignores the transcoding in and out, leaving the core only to handle clean UTF8 strings what non-trivial things are left? Could this be a candidate for a SOC project? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Thread (17 messages)

« previous php.internals (#72614) next »