Re: utf-8 filenames in phar files.
Hi Dan,
On Sat, Feb 15, 2014 at 1:11 AM, Dan Ackroyd <[email protected]> wrote:
> That is not an issue as:
>
> i) Phar files produced on a windows machine should be identical to
> those produced on a Linux or OSX box.
>
> ii) There is a test in the phar code, so that if you do have filenames
> that are degenerate after normalising, the extraction throws an error.
> e.g. for the files
>
> $filename1 = "Am\xC3\xA9lie.txt";
> $filename2 = "Am\x65\xCC\x81lie.txt";
>
> If you add both to a phar archive and then attempt to extract them
> both you get the error:
>
> "Cannot extract "Amélie.txt" to "output/Amélie.txt", path
> already
> exists"
>
I suppose there is no normalization code in phar, so your system(OS / file
system) normalizes file name.
Depending on system's normalization is not good.
- File name could be NFC or NFD
- File names in phar may differ by system
- Systems that do not normalize Unicode actively exist
I do see file name normalization issue on my Linux/Windows and OSX with
git. (core.precomposeunicode=true is required for correct operation on OSX)
I suggest to apply NFC normalization to avoid issue, like git.
core.precomposeunicode
This option is only used by Mac OS implementation of Git. When
core.precomposeunicode=true, Git reverts the unicode decomposition of
filenames done by Mac OS. This is useful when sharing a repository between
Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed,
or Git under cygwin 1.7). When false, file names are handled fully
transparent by Git, which is backward compatible with older versions of Git..
http://git-scm.com/docs/git-config
As Rowan pointed out, although ICU is detected by acinclude.m4 always, #if
should be used for ICU/intl related code. (intl uses ICU, use intl = use
ICU. I think it's better not to rely on intl. It may be disabled or can be
DL module. There are systems without ICU also.)
Regards,
--
Yasuo Ohgaki
[email protected]
Thread (24 messages)