Skip to content

random access uncompressed unencrypted ZipExtFile #128131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vvb2060 opened this issue Dec 20, 2024 · 6 comments
Closed

random access uncompressed unencrypted ZipExtFile #128131

vvb2060 opened this issue Dec 20, 2024 · 6 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@vvb2060
Copy link

vvb2060 commented Dec 20, 2024

Feature or enhancement

Proposal:

# Fast seek uncompressed unencrypted file
elif self._compress_type == ZIP_STORED and self._decrypter is None and read_offset > 0:
# disable CRC checking after first seeking - it would be invalid
self._expected_crc = None
# seek actual file taking already buffered data into account
read_offset -= len(self._readbuffer) - self._offset
self._fileobj.seek(read_offset, os.SEEK_CUR)
self._left -= read_offset
read_offset = 0
# flush read buffer
self._readbuffer = b''
self._offset = 0
elif read_offset < 0:
# Position is before the current position. Reset the ZipExtFile

if read_offset < 0, ZipExtFile is reset and read from the beginning. I think read_offset > 0 is unnecessary.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@danifus
Copy link
Contributor

danifus commented Dec 22, 2024

There is a bug in zipfile._SharedFile.seek() that affects concurrent reads of uncompressed, unencrypted files which should be merged along with this PR: #127856

@jaraco
Copy link
Member

jaraco commented Dec 26, 2024

Can you revise the original post to clarify what’s going on here? I read it, but I don’t understand: what is wrong with the current behavior? Under what conditions do the problems occur and thus who is affected? What do you expect instead? Just articulate as much as you can so it’s clear from the problem description what the proposed improvement is.

@vvb2060
Copy link
Author

vvb2060 commented Dec 31, 2024

#27737 does not really support random access, currently zipfile can only read uncompressed unencrypted ZipExtFile sequentially (only forward seek).

@jaraco
Copy link
Member

jaraco commented Jan 20, 2025

Thanks vvb2060. Let me re-articulate what I think you're saying.

In #27737, the zipfile module introduced the possibility of seeking within a zip file, but only for compressed or encrypted payloads. Today, if someone attempts to seek with an offset that's 0 or negative, the wrong logic is reached, and an error occurs. Instead, it should be straightforward to enable seeking in any direction (any value of read_offset) consistently (for unencrypted, uncompressed files).

jaraco pushed a commit that referenced this issue Jan 20, 2025
…crypted files in ZipFile (#128143)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 20, 2025
…d unencrypted files in ZipFile (pythonGH-128143)

(cherry picked from commit dda02eb)

Co-authored-by: 5ec1cff <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 20, 2025
…d unencrypted files in ZipFile (pythonGH-128143)

(cherry picked from commit dda02eb)

Co-authored-by: 5ec1cff <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <[email protected]>
jaraco pushed a commit that referenced this issue Jan 20, 2025
…ed unencrypted files in ZipFile (GH-128143) (#129091)

GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile (GH-128143)
(cherry picked from commit dda02eb)

Co-authored-by: 5ec1cff <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <[email protected]>
@picnixz picnixz added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Jan 20, 2025
@picnixz
Copy link
Member

picnixz commented Jan 20, 2025

See #128143 (comment) for the rationale of backports and the type-bug label.

jaraco pushed a commit that referenced this issue Jan 20, 2025
…ed unencrypted files in ZipFile (GH-128143) (#129092)

GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile (GH-128143)
(cherry picked from commit dda02eb)

Co-authored-by: 5ec1cff <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <[email protected]>
@picnixz
Copy link
Member

picnixz commented Jan 20, 2025

Closing since completed and backported. Thank you everyone.

@picnixz picnixz closed this as completed Jan 20, 2025
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Jan 21, 2025
…d unencrypted files in ZipFile (python#128143)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

4 participants