On 20/02/2014 06:54, Pierre Joye wrote:
hi,
Hello :-),
Unicode still remains one of the top requested features in PHP.
However as Rasmus and other stated earlier, it is not a trivial job.
Some of the keys point we need to take care of are:
- UTF-8 storage
- UTF-8 support for almost (if not all) existing string APIs
- Performance
As of today, I did not find any library covering at least two of these
key points.
[snip]
I would like to begin to discuss our option now already. I am not
asking to get in all implementation details from a userland point of
view (like u"some text" or addng new APIs or not) but only to see what
we can do internally to work with UTF-8 string.
Just a little note: using a u"foobar"
syntax would help to switch from one to another light or heavy implementation internally, and thus, it would help to cover at least two of the key points described above.
I would mention the Rust implementation of UTF-8 strings [1, 2]. It's fast, it's safe and it has a nice large API. I don't say I want to see PHP using Rust. I think it would be hard to do (even if it will certainly benefit PHP), but the algorithms they used can be a source of inspiration for us. Maybe we should consider it if we decide to have our own implementation instead of using a third library.
Cheers.
[1] https://github.com/mozilla/rust/blob/master/src/libstd/str.rs
[2] http://static.rust-lang.org/doc/master/std/str/index.html
--
Ivan Enderlin
Developer of Hoa
http://hoa-project.net/
PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/
Member of HTML and WebApps Working Group of W3C
http://w3.org/