Re: utf-8 filenames in phar files.
Hi Dan,
On Fri, Feb 14, 2014 at 10:00 AM, Dan Ackroyd <[email protected]>wrote:
> I'm not sure I understand you. Do you have any example filenames that
> I can test against to make sure that different filenames don't 'appear
> the same'?
>
You can create filenames appears the same, but has different
representation with NFC/NFD normalization. For instance, "がぎぐげご.txt"
will have different byte pattern, since NFD decomposes 「が」into
「か」and 「゛」, and so on.
Windows and Linux's Unicode seems to use NFC, but it is a coincidence
as they only use composed form of Unicode. i.e. They don't compose
intentionally.
OSX decompose intentionally. Decomposed filenames will appear
the same on Windows and Linux and possible to have 2 files with the
same name semantically.
NOTE: OSX's NFD differs from Unicode standard a little.
Older subversion/git didn't take care normalization difference and created
multiple filenames that appear the same when user uses both OSX and
Windows/Linux.
Also, the filenames used in phar files are not exposed to the
> underlying system. They are held completely within the PHP phar file
> and shouldn't be affected by platform. The restriction on characters
> was caused by ext/phar explicitly rejecting utf-8 multibyte
> characters.
>
I don't use phar much. It's possible use it as archive, right?
https://php.net/phar.extractto
I think it's great change even without normalization. It's better
if normalization is taken care of.
To handle normalization difference, you may apply NFC normalization
on OSX.
Regards,
--
Yasuo Ohgaki
[email protected]
Thread (24 messages)