Skip to content

bpo-43260: io: Prevent large data remains in textio buffer. #24592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 21, 2021

Conversation

methane
Copy link
Member

@methane methane commented Feb 20, 2021

When very large data remains in TextIOWrapper, it can not flush the internal
buffer forever because of MemoryError.

bpo-43260

When very large data remains in TextIOWrapper, flush() may fail forever.
So prevent large (i.e. 1MiB) data remains in TextIOWrapper internal
buffer.
@eryksun
Copy link
Contributor

eryksun commented Feb 20, 2021

This needs a test in a subprocess. In POSIX the maximum available address space or heap size can be easily controlled. For example:

>>> import resource
>>> hard_limit = resource.getrlimit(resource.RLIMIT_AS)[1]
>>> resource.setrlimit(resource.RLIMIT_AS, (128 * (1<<20), hard_limit))
>>> f = open('test.txt', 'w')
>>> f.write('a' * (64 * (1<<20)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>> f.write('a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

It's possible in Windows with kernel Job objects, but Python's standard library doesn't support them, not without ctypes.

@methane
Copy link
Member Author

methane commented Feb 20, 2021

@eryksun I don't like such tests. Such tests are fragile.

@methane
Copy link
Member Author

methane commented Feb 21, 2021

Having buffer in Pure Python TextIOWrapper doesn't make sense. It is just an overhead.

@eryksun
Copy link
Contributor

eryksun commented Feb 21, 2021

Having buffer in Pure Python TextIOWrapper doesn't make sense. It is just an overhead.

That's fine. I just wanted to know if there was more work to do in order to align this behavior with _pyio.

_pyio.TextIOWrapper is intentionally designed to behave as if write_through is always enabled. I do not understand this decision, but it is what it is. To me it seems that the optimization to internally buffer small writes is fundamental to the behavior of io.TextIOWrapper, i.e. it is not an internal implementation detail since it fundamentally alters the behavior. So it should be emulated by _pyio regardless of the effect on performance at this level.

@methane
Copy link
Member Author

methane commented Feb 21, 2021

To me it seems that the optimization to internally buffer small writes is fundamental to the behavior of io.TextIOWrapper, i.e. it is not an internal implementation detail since it fundamentally alters the behavior. So it should be emulated by _pyio regardless of the effect on performance at this level.

I don't think so. There is a BufferedIO under the TextIOWrapper. No need to add one additonal buffer layer.

C implementation of TextIOWrapper is fast. Since it is very fast, calling BufferedIO.write() is slower than appending internal buffer.

On the other hand, Python implementation is not so fast. Calling overhead is not so large. So adding one more buffering layer isn't worth enough.

@eryksun
Copy link
Contributor

eryksun commented Feb 21, 2021

On the other hand, Python implementation is not so fast. Calling overhead is not so large. So adding one more buffering layer isn't worth enough.

Again, it's nothing to do with performance, but instead matching the observable behavior of io.TextIOWrapper with regard to when and how much it writes to the underlying buffer, which I expected _pyio.TextIOWrapper to do for the sake of valid experimentation. When I asked the question, I wasn't aware that it's intentionally designed to deviate from the observable behavior of io.TextIOWrapper in this case.

@methane methane merged commit 01806d5 into python:master Feb 21, 2021
@methane methane deleted the textio-largebuf branch February 21, 2021 23:29
@miss-islington
Copy link
Contributor

Thanks @methane for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8, 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @methane, I had trouble checking out the 3.9 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 01806d5beba3d208bb56adba6829097d803bf54f 3.9

@miss-islington
Copy link
Contributor

Sorry, @methane, I could not cleanly backport this to 3.8 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 01806d5beba3d208bb56adba6829097d803bf54f 3.8

methane added a commit to methane/cpython that referenced this pull request Feb 21, 2021
…thonGH-24592)

When very large data remains in TextIOWrapper, flush() may fail forever.

So prevent that data larger than chunk_size is remained in TextIOWrapper internal
buffer.

Co-Authored-By: Eryk Sun
(cherry picked from commit 01806d5)

Co-authored-by: Inada Naoki <[email protected]>
@bedevere-bot
Copy link

GH-24617 is a backport of this pull request to the 3.9 branch.

@miss-islington
Copy link
Contributor

Thanks @methane for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @methane, I had trouble checking out the 3.8 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 01806d5beba3d208bb56adba6829097d803bf54f 3.8

@bedevere-bot
Copy link

GH-24618 is a backport of this pull request to the 3.8 branch.

methane added a commit to methane/cpython that referenced this pull request Feb 21, 2021
…thonGH-24592)

When very large data remains in TextIOWrapper, flush() may fail forever.

So prevent that data larger than chunk_size is remained in TextIOWrapper internal
buffer.

Co-Authored-By: Eryk Sun.
(cherry picked from commit 01806d5)

Co-authored-by: Inada Naoki <[email protected]>
methane added a commit that referenced this pull request Feb 22, 2021
When very large data remains in TextIOWrapper, flush() may fail forever.

So prevent that data larger than chunk_size is remained in TextIOWrapper internal
buffer.

Co-Authored-By: Eryk Sun
(cherry picked from commit 01806d5)
methane added a commit that referenced this pull request Feb 22, 2021
When very large data remains in TextIOWrapper, flush() may fail forever.

So prevent that data larger than chunk_size is remained in TextIOWrapper internal
buffer.

Co-Authored-By: Eryk Sun.
(cherry picked from commit 01806d5)
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
…-24592)

When very large data remains in TextIOWrapper, flush() may fail forever.

So prevent that data larger than chunk_size is remained in TextIOWrapper internal
buffer.

Co-Authored-By: Eryk Sun
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants