Skip to content

ZipInfo filename is mangled when os.sep is not '/' #94529

Open
@gerph

Description

@gerph

Bug report

The ZipInfo object within ZipFile performs an explicit translation of the filename.

cpython/Lib/zipfile.py

Lines 378 to 382 in 7db1d2e

# This is used to ensure paths in generated ZIP files always use
# forward slashes as the directory separator, as required by the
# ZIP format specification.
if os.sep != "/" and os.sep in filename:
filename = filename.replace(os.sep, "/")

I believe this is intended to make it easy to use on Windows where you might pass an explicit pathname to the ZipInfo object creation. On Windows the filesystem separator is commonly \ (although it supports / in many cases), so that this foces the the filename attribute to contain a filename in the unix form.
This logic is used whether the ZipInfo object is created manually (usually to add a new file), or when the filename has been taken from an archive that is being extracted.

However, the logic is broken on systems where the os.sep is anything else other than \ or /. On systems where the os.sep is . this means that if you try to create an archive with a file containing a . extension the filename in the archive will be mangled. On such a system, extracting an archive will also mangle the filename.

To demonstrate this, it is possible to do a very simple command line example:

>>> import zipfile
>>> import os
>>> os.sep = '.'
>>> zipfile.ZipInfo('hello.txt')
<ZipInfo filename='hello/txt' file_size=0>

In the real world, this breaks any possibility of using this module on RISC OS where the filesystem separator in os.sep is .. In the current Python 3 on RISC OS, the ZipFile module will always mangle filenames that have standard extensions.

I believe that the intention of the object is that:

  • the filename initialiser on the object and attribute is in unicode format (this has been enforced since Python 3 by the explicit decodes in the archive member reading code).
  • the filename attribute is formed as would be stored in the archive, using / as a directory separator (stated by documentation filename should be the full name of the archive member).
  • the filename initialiser on the object is allowed to be supplied a path name on unix and windows systems, as a convenience (the referenced code will have been relied on by existing software).

As such, I believe the referenced code is broken, and to retain the above assumptions and to allow the handling of zip archives on systems where os.sep is not / or \, the code should instead read:

        if os.sep == "\\" and os.sep in filename:
            filename = filename.replace(os.sep, "/")

This removes the overzealous replacement of os.sep in the creation of the ZipInfo object.

Further problems exist with the from_file method which I shall raise separately.

There are some issues which might be related to this (but this change does not preclude them): #90139 and #92184.

Your environment

  • CPython versions tested on: Python 3.9, 3.10
  • Operating system and architecture: On OS X, simulating the problem seen on RISC OS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions