-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Too early EOFError #101911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@serhiy-storchaka, @gpshead (as zipfile experts that were seen this week) |
It was working with erlend-aasland committed on Jul 22, 2022 f9b3706 and before This was validated by However without clone I see a carry over from previous checkouts and results are different depending what was checked out before. Is there a more efficient way to build a python for a particular checkout? |
Take a look at https://git-scm.com/docs/git-bisect |
Thanks, I'll check it out.
…On Wed, Feb 15, 2023 at 12:22 PM Nikita Sobolev ***@***.***> wrote:
Take a look at https://git-scm.com/docs/git-bisect
—
Reply to this email directly, view it on GitHub
<#101911 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASRZU6AM54BTPGD7JLNRWKLWXS35PANCNFSM6AAAAAAU3ZZ57E>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I narrowed the regression down to below cpython commit.
This is yours. This was easy, as this error showed up fast and reliable.
However, there is a second one which shows up sporadically and inconstant.
This is more time consuming and git bisect may take longer.
It needs reading many files before the error is triggered.
Python-3.11.1 is working well. Currently a git bisect script is running and
hopefully finds the respective commit.
Best
Christoph
commit 330f1d5 BAD1
Author: JuniorJPDJ ***@***.***>
Date: Sun Aug 7 01:21:23 2022 +0200
gh-88339: enable fast seeking of uncompressed unencrypted
zipfile.ZipExtFile (GH-27737)
Avoid reading all of the intermediate data in uncompressed items in a
zip file when the user seeks forward.
Contributed by: @JuniorJPDJ
On Thu, Feb 16, 2023 at 2:14 PM christoph gille ***@***.***>
wrote:
… Thanks, I'll check it out.
On Wed, Feb 15, 2023 at 12:22 PM Nikita Sobolev ***@***.***>
wrote:
> Take a look at https://git-scm.com/docs/git-bisect
>
> —
> Reply to this email directly, view it on GitHub
> <#101911 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ASRZU6AM54BTPGD7JLNRWKLWXS35PANCNFSM6AAAAAAU3ZZ57E>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
I found the other commit since when occasional, inconstant problems occur when using Ziprofs: commit 26a162b BAD commit 90e7230 GOOD |
Thanks again for the great trick with git bisect.
I found the other commit since when occasional, inconstant problems occur
when using Ziprofs:
I contacted the commiter via Email. The problem cannot be reproduced with
simple shell commands.
It occurs eventually when processing a large number of files by a
prorietory software.
That means there are two regressions which prevent using the recent cpython
version.
commit 26a162b **BAD**
Author: Jan Wolski ***@***.***>
Date: Sun May 15 17:49:19 2022 +0300
gh-89668: Optimize ZipFile file header processing algorithm to avoid
unneeded IO(gh-25966)
commit 90e7230 **GOOD**
Author: Victor Stinner ***@***.***>
Date: Sun May 15 11:19:52 2022 +0200
gh-92781: Avoid mixing declarations and code in C API (#92783)
Avoid mixing declarations and code in the C API to fix the compiler
warning: "ISO C90 forbids mixed declarations and code"
[-Werror=declaration-after-statement].
On Sat, Feb 18, 2023 at 8:07 AM christoph gille ***@***.***>
wrote:
… I narrowed the regression down to below cpython commit.
This is yours. This was easy, as this error showed up fast and reliable.
However, there is a second one which shows up sporadically and inconstant.
This is more time consuming and git bisect may take longer.
It needs reading many files before the error is triggered.
Python-3.11.1 is working well. Currently a git bisect script is running
and
hopefully finds the respective commit.
Best
Christoph
commit 330f1d5 BAD1
Author: JuniorJPDJ ***@***.***>
Date: Sun Aug 7 01:21:23 2022 +0200
gh-88339: enable fast seeking of uncompressed unencrypted
zipfile.ZipExtFile (GH-27737)
Avoid reading all of the intermediate data in uncompressed items in a
zip file when the user seeks forward.
Contributed by: @JuniorJPDJ
On Thu, Feb 16, 2023 at 2:14 PM christoph gille <
***@***.***> wrote:
> Thanks, I'll check it out.
>
> On Wed, Feb 15, 2023 at 12:22 PM Nikita Sobolev ***@***.***>
> wrote:
>
>> Take a look at https://git-scm.com/docs/git-bisect
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#101911 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/ASRZU6AM54BTPGD7JLNRWKLWXS35PANCNFSM6AAAAAAU3ZZ57E>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
|
I found out that the issues are very likely due to f.seek(offset). A multithreaded application is loading zip entries via ziprofs Looking at the function When only one thread is reading at a time, |
zipfile is not threadsafe, that's nothing new |
This looks like either a problem with ziprofs, a problem with user code, and/a feature request for thread safe zipfile. So I’m going to close this. If you can reproduce a problem with single threaded code and just zipfile, then please reopen this and attach the reproducer. If you want to suggest a thread safe zipfile, then the best thing to do is start a discussion on discuss.python.org. |
I think I can reproduce this with epy (current master, commit c7a87f3), and Python 3.10.10 (from openSUSE/Tumbleweed packages):
It doesn’t happen with all EPubs, but quite often (like 50:50?). (Yes, I checked with other EPub readers that Twenty Four Years Later-ao3_29298975.epub (renamed to AFAIK, epy is rather simple, certainly single-threaded, application. |
I don't think anything looks that weird about the zip file, but do note the current version of epy-reader does use multiprocessing and has several comments that appear to be observed issues decompressing with multiprocessing. Can you try with that turned off? |
This commited fix #127856 resolved a couple of bugs with regards to concurrent reading of files in a zip. If it is related to multiprocesssing, it may be worth checking if this bug still exists in newer versions of python |
(Un)fortunately, |
Uh oh!
There was an error while loading. Please reload this page.
I am using the ziprofs.py on Ubuntu. When I run this in Python-3.11.1 it works very well. With Python
3.12.0a4+ there are problems. In both cases, the same ziprofs.py and fuse.py files are used.
The only difference is the python version.
Unfortunately, I was unable to demonstrate problems using Linux
commands like wc, gzip, md5sum. Sofar, I only see this problem with proprietary closed source software.
It appears with different files.
When this EOFError occurs in
def read(self,path,size,offset,fh) of ziprofs.py
then the value of parameter "size" was 131072 and that of "offset" was 75755520.
The size of the file is 525537280 Bytes.
Strikingly, the low value of offset suggests, that the EOF is not yet riched.
Many thanks
C
The text was updated successfully, but these errors were encountered: