Re: utf-8 filenames in phar files.

From: Date: Sat, 15 Feb 2014 01:07:46 +0000
Subject: Re: utf-8 filenames in phar files.
References: 1 2 3 4 5 6  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Yasuo wrote:

> File names in phar may differ by system

No. No they won't.

They will be exactly as they are specified when they are added by the
user to the Phar archive.

cheers
Dan

On 14 February 2014 23:56, Yasuo Ohgaki <[email protected]> wrote:
> Hi Dan,
>
> On Sat, Feb 15, 2014 at 1:11 AM, Dan Ackroyd <[email protected]> wrote:
>>
>> That is not an issue as:
>>
>> i) Phar files produced on a windows machine should be identical to
>> those produced on a Linux or OSX box.
>>
>> ii) There is a test in the phar code, so that if you do have filenames
>> that are degenerate after normalising, the extraction throws an error.
>> e.g. for the files
>>
>>     $filename1 = "Am\xC3\xA9lie.txt";
>>     $filename2 = "Am\x65\xCC\x81lie.txt";
>>
>> If you add both to a phar archive and then attempt to extract them
>> both you get the error:
>>
>>     "Cannot extract "Amélie.txt" to "output/Amélie.txt", path
>> already
>> exists"
>
>
> I suppose there is no normalization code in phar, so your system(OS / file
> system) normalizes file name.
>
> Depending on system's normalization is not good.
>
>  - File name could be NFC or NFD
>  - File names in phar may differ by system
>  - Systems that do not normalize Unicode actively exist
>
> I do see file name normalization issue on my Linux/Windows and OSX with git.
> (core.precomposeunicode=true is required for correct operation on OSX) I
> suggest to apply NFC normalization to avoid issue, like git.
>
> core.precomposeunicode
> This option is only used by Mac OS implementation of Git. When
> core.precomposeunicode=true, Git reverts the unicode decomposition of
> filenames done by Mac OS. This is useful when sharing a repository between
> Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed, or
> Git under cygwin 1.7). When false, file names are handled fully transparent
> by Git, which is backward compatible with older versions of Git.
> http://git-scm.com/docs/git-config
>
> As Rowan pointed out, although ICU is detected by acinclude.m4 always, #if
> should be used for ICU/intl related code. (intl uses ICU, use intl = use
> ICU. I think it's better not to rely on intl. It may be disabled or can be
> DL module. There are systems without ICU also.)
>
> Regards,
>
> --
> Yasuo Ohgaki
> [email protected]


Thread (24 messages)

« previous php.internals (#72619) next »