Skip to content

hashlib.file_digest() can't handle non-blocking I/O #122179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
srittau opened this issue Jul 23, 2024 · 7 comments
Closed

hashlib.file_digest() can't handle non-blocking I/O #122179

srittau opened this issue Jul 23, 2024 · 7 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@srittau
Copy link
Contributor

srittau commented Jul 23, 2024

Bug report

Bug description:

This came up in python/typeshed#12414.

The current implementation of file_digest() does not check the return value of fileobj.readinto() for None:

cpython/Lib/hashlib.py

Lines 232 to 236 in 2a5d1eb

while True:
size = fileobj.readinto(buf)
if size == 0:
break # EOF
digestobj.update(view[:size])

While buffered file objects can't return None, unbuffered ones can when they are doing non-blocking I/O. Specifically, file_digest() is documented to take SocketIO objects, which can very much return None:

cpython/Lib/socket.py

Lines 694 to 714 in 2a5d1eb

def readinto(self, b):
"""Read up to len(b) bytes into the writable buffer *b* and return
the number of bytes read. If the socket is non-blocking and no bytes
are available, None is returned.
If *b* is non-empty, a 0 return value indicates that the connection
was shutdown at the other end.
"""
self._checkClosed()
self._checkReadable()
if self._timeout_occurred:
raise OSError("cannot read from timed out object")
try:
return self._sock.recv_into(b)
except timeout:
self._timeout_occurred = True
raise
except error as e:
if e.errno in _blocking_errnos:
return None
raise

CPython versions tested on:

CPython main branch

Operating systems tested on:

Other

Linked PRs

@srittau srittau added the type-bug An unexpected behavior, bug, or error label Jul 23, 2024
@srittau
Copy link
Contributor Author

srittau commented Jul 23, 2024

The solution could be as easy as adding if size is None: continue, possibly adding a short (.1s?) delay, but I'm not terribly familiar with non-blocking I/O.

@graingert
Copy link
Contributor

I suspect you want if size is None: raise BlockingIOError

@graingert
Copy link
Contributor

possibly adding a short (.1s?) delay

This wouldn't help, as when using non blocking IO it's likely that no new data will arrive until another task running on this thread gets executed

@gpshead
Copy link
Member

gpshead commented Jul 31, 2024

FWIW, I do not think it is reasonable to expect that APIs taking a file object as input in order to read or write data from it to be expected to handle non-blocking IO unless they explicitly document themselves as doing so. It isn't the default expectation for what a "file object" is.

hashlib.file_digest() refers to "SocketIO objects from socket.makefile()" which documents itself as "The socket must be in blocking mode". So the problem statement in this issue isn't quite accurate on that detail.

We should add the word "blocking" to the file_digest documentation to make the implied constraint more clear.

The proposed PR as is would make it explicitly raise an error when a file read did return None internally implying it was in non-blocking mode. It's at least a nicer error to read than trying to figure out why something raised TypeError due to None being used when bytes was expected.

@srittau
Copy link
Contributor Author

srittau commented Jul 31, 2024

hashlib.file_digest() refers to "SocketIO objects from socket.makefile()" which documents itself as "The socket must be in blocking mode". So the problem statement in this issue isn't quite accurate on that detail.

We should add the word "blocking" to the file_digest documentation to make the implied constraint more clear.

Good catch. I'll amend the docs tomorrow, unless someone beats me to it. (I think all Python maintainers can commit to this PR, and should feel free to do so.)

@graingert
Copy link
Contributor

graingert commented Jul 31, 2024

It's not clear to me why socket.makefile() has this defensive coding around non blocking sockets when they're documented as not supported. Checking blame it was added due to #54063

gpshead pushed a commit that referenced this issue Apr 21, 2025
* Fix hashlib.file_digest and non-blocking I/O
* Add documentation around this behavior
* Add versionchanged
@gpshead
Copy link
Member

gpshead commented Apr 21, 2025

thanks!

@gpshead gpshead closed this as completed Apr 21, 2025
gpshead added a commit that referenced this issue Apr 21, 2025
…2787)

gh-122179: Fix hashlib.file_digest and non-blocking I/O (GH-122183)

* Fix hashlib.file_digest and non-blocking I/O
* Add documentation around this behavior
* Add versionchanged

(cherry picked from commit 2b47f46)

Co-authored-by: Sebastian Rittau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants