Skip to content

Too early EOFError #101911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
christophgil opened this issue Feb 14, 2023 · 14 comments
Open

Too early EOFError #101911

christophgil opened this issue Feb 14, 2023 · 14 comments
Assignees
Labels
3.12 only security fixes stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@christophgil
Copy link

christophgil commented Feb 14, 2023

I am using the ziprofs.py on Ubuntu. When I run this in Python-3.11.1 it works very well. With Python
3.12.0a4+ there are problems. In both cases, the same ziprofs.py and fuse.py files are used.
The only difference is the python version.

Unfortunately, I was unable to demonstrate problems using Linux
commands like wc, gzip, md5sum. Sofar, I only see this problem with proprietary closed source software.
It appears with different files.

When this EOFError occurs in
def read(self,path,size,offset,fh) of ziprofs.py
then the value of parameter "size" was 131072 and that of "offset" was 75755520.
The size of the file is 525537280 Bytes.
Strikingly, the low value of offset suggests, that the EOF is not yet riched.

Many thanks
C

File "/local/filesystem/git/ZipROFS/ziprofs.py", line 258, in read
    return f.read(size)
           ^^^^^^^^^^^^
  File "/local/python/2023_02_cpython-main/Lib/zipfile/__init__.py", line 948, in read
    data = self._read1(n)
           ^^^^^^^^^^^^^^
  File "/local/python/2023_02_cpython-main/Lib/zipfile/__init__.py", line 1018, in _read1
    data = self._read2(n)
           ^^^^^^^^^^^^^^
  File "/local/python/2023_02_cpython-main/Lib/zipfile/__init__.py", line 1052, in _read2
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local/filesystem/git/ZipROFS/fuse.py", line 744, in _wrapper
    return func(*args, **kwargs) or 0
           ^^^^^^^^^^^^^^^^^^^^^
  File "/local/filesystem/git/ZipROFS/fuse.py", line 860, in read
    ret = self.operations('read', self._decode_optional_path(path), size, offset, fh)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/filesystem/git/ZipROFS/ziprofs.py", line 127, in __call__
    return super().__call__(op, self.root + path, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/filesystem/git/ZipROFS/fuse.py", line 1097, in __call__
    return getattr(self, op)(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local/filesystem/git/ZipROFS/ziprofs.py", line 261, in read
    exit(999)
  File "<frozen _sitebuiltins>", line 26, in __call__
SystemExit: 999
@christophgil christophgil added the type-bug An unexpected behavior, bug, or error label Feb 14, 2023
@arhadthedev arhadthedev added stdlib Python modules in the Lib dir topic-IO labels Feb 15, 2023
@arhadthedev
Copy link
Member

@serhiy-storchaka, @gpshead (as zipfile experts that were seen this week)

@christophgil
Copy link
Author

It was working with erlend-aasland committed on Jul 22, 2022 f9b3706 and before
and failed since @sobolevn sobolevn committed on Oct 4, 2022 4f380db

This was validated by
git clone https://github.com/python/cpython.git
git checkout 4f380db
rm python; find . -name '*.o' -delete; ./configure; sed -i 's| -O3 | |1' Makefile && make

However without clone I see a carry over from previous checkouts and results are different depending what was checked out before.

Is there a more efficient way to build a python for a particular checkout?
Then I could further narrow it down.

@sobolevn
Copy link
Member

Take a look at https://git-scm.com/docs/git-bisect

@christophgil
Copy link
Author

christophgil commented Feb 16, 2023 via email

@christophgil
Copy link
Author

christophgil commented Feb 18, 2023 via email

@christophgil
Copy link
Author

I found the other commit since when occasional, inconstant problems occur when using Ziprofs:
I contacted the commiter via Email. The problem cannot be reproduced with simple shell commands.
It occurs eventually when processing a large number of files by a prorietory software.

commit 26a162b BAD
Author: Jan Wolski [email protected]
Date: Sun May 15 17:49:19 2022 +0300
gh-89668: Optimize ZipFile file header processing algorithm to avoid unneeded IO(gh-25966)

commit 90e7230 GOOD
Author: Victor Stinner [email protected]
Date: Sun May 15 11:19:52 2022 +0200
gh-92781: Avoid mixing declarations and code in C API (#92783)
Avoid mixing declarations and code in the C API to fix the compiler
warning: "ISO C90 forbids mixed declarations and code"
[-Werror=declaration-after-statement].

@christophgil
Copy link
Author

christophgil commented Feb 20, 2023 via email

@christophgil
Copy link
Author

I found out that the issues are very likely due to f.seek(offset).

A multithreaded application is loading zip entries via ziprofs
and causes above described errors.
By setting the thread number to one, the errors are gone.

Looking at the function
def read(self, path, size, offset, fh)
in ziprofs, the seek offset is jumping forth and back,
when several threads are reading from the same zip.

When only one thread is reading at a time,
the zip entry is read nicely sequentially.

@jpruciak
Copy link

jpruciak commented Mar 15, 2023

zipfile is not threadsafe, that's nothing new
that probably means that locks in ziprofs are not working as intended

@ericvsmith
Copy link
Member

This looks like either a problem with ziprofs, a problem with user code, and/a feature request for thread safe zipfile. So I’m going to close this.

If you can reproduce a problem with single threaded code and just zipfile, then please reopen this and attach the reproducer.

If you want to suggest a thread safe zipfile, then the best thing to do is start a discussion on discuss.python.org.

@mcepl
Copy link
Contributor

mcepl commented Mar 31, 2023

I think I can reproduce this with epy (current master, commit c7a87f3), and Python 3.10.10 (from openSUSE/Tumbleweed packages):

stitny~/K/f/reading_the_books$ epy Twenty\ Four\ Years\ Later-ao3_29298975.epub 
Traceback (most recent call last):
  File "/home/matej/.bin/epy", line 8, in <module>
    sys.exit(main())
  File "/home/matej/archiv/knihovna/repos/epy/src/epy_reader/__main__.py", line 18, in main
    filepath = curses.wrapper(reader.start_reading, filepath)
  File "/usr/lib64/python3.10/curses/__init__.py", line 94, in wrapper
    return func(stdscr, *args, **kwds)
  File "/home/matej/archiv/knihovna/repos/epy/src/epy_reader/reader.py", line 1599, in start_reading
    reading_state_or_ebook = reader.read(reading_state)
  File "/home/matej/archiv/knihovna/repos/epy/src/epy_reader/reader.py", line 847, in read
    text_structure, toc_entries, contents = self.get_current_book_content(reading_state)
  File "/home/matej/archiv/knihovna/repos/epy/src/epy_reader/reader.py", line 797, in get_current_book_content
    content = self.ebook.get_raw_text(content_path)
  File "/home/matej/archiv/knihovna/repos/epy/src/epy_reader/ebooks/epub.py", line 188, in get_raw_text
    content = self.file.open(content_path).read()
  File "/usr/lib64/python3.10/zipfile.py", line 911, in read
    buf += self._read1(self.MAX_N)
  File "/usr/lib64/python3.10/zipfile.py", line 993, in _read1
    data += self._read2(n - len(data))
  File "/usr/lib64/python3.10/zipfile.py", line 1028, in _read2
    raise EOFError
EOFError
stitny~/K/f/reading_the_books$ 

It doesn’t happen with all EPubs, but quite often (like 50:50?).

(Yes, I checked with other EPub readers that Twenty Four Years Later-ao3_29298975.epub (renamed to *.zip because the GitHub rules for the attachment naming) is correct EPub).

AFAIK, epy is rather simple, certainly single-threaded, application.

@gpshead gpshead self-assigned this Mar 31, 2023
@iritkatriel iritkatriel added the 3.12 only security fixes label Apr 5, 2023
@thatch
Copy link
Contributor

thatch commented May 20, 2025

I don't think anything looks that weird about the zip file, but do note the current version of epy-reader does use multiprocessing and has several comments that appear to be observed issues decompressing with multiprocessing. Can you try with that turned off?

@danifus
Copy link
Contributor

danifus commented May 20, 2025

This commited fix #127856 resolved a couple of bugs with regards to concurrent reading of files in a zip. If it is related to multiprocesssing, it may be worth checking if this bug still exists in newer versions of python

@mcepl
Copy link
Contributor

mcepl commented May 21, 2025

(Un)fortunately, epy got fixed in wustho/epy@ee3d693, so it doesn’t crash any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

10 participants