Skip to content

Waiting for asyncio.StreamWriter.drain() twice in parallel raises an AssertionError when the transport stopped writing #74116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
metathink mannequin opened this issue Mar 28, 2017 · 12 comments
Assignees
Labels
3.10 only security fixes 3.11 only security fixes 3.12 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@metathink
Copy link
Mannequin

metathink mannequin commented Mar 28, 2017

BPO 29930
Nosy @vstinner, @cjerdonek, @socketpair, @1st1, @RemiCardona, @jnwatson, @CallumQuick
Files
  • server.py
  • client.py
  • drain_multiple_waiters.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2017-03-28.09:45:54.697>
    labels = ['type-bug', 'expert-asyncio']
    title = 'Waiting for asyncio.StreamWriter.drain() twice in parallel raises an AssertionError when the transport stopped writing'
    updated_at = <Date 2019-11-04.15:20:53.898>
    user = 'https://bugs.python.org/metathink'

    bugs.python.org fields:

    activity = <Date 2019-11-04.15:20:53.898>
    actor = 'callumquick'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2017-03-28.09:45:54.697>
    creator = 'metathink'
    dependencies = []
    files = ['46761', '46762', '46763']
    hgrepos = []
    issue_num = 29930
    keywords = ['patch']
    message_count = 11.0
    messages = ['290690', '290702', '290703', '290704', '290706', '290732', '290760', '290773', '298981', '301767', '355951']
    nosy_count = 9.0
    nosy_names = ['vstinner', 'chris.jerdonek', 'socketpair', 'aymeric.augustin', 'yselivanov', 'RemiCardona', 'jnwatson', 'metathink', 'callumquick']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue29930'
    versions = ['Python 3.6']

    @metathink
    Copy link
    Mannequin Author

    metathink mannequin commented Mar 28, 2017

    While trying to break some code in a project using asyncio, I found that under certain circumstances, asyncio.StreamWriter.drain raises an AssertionError.

    1. There must be a lot of concurrent uses of "await writer.drain()"
    2. The server on which we send data must be public, no AssertionError occurs while connected to 127.0.0.1
    Task exception was never retrieved
    future: <Task finished coro=<flooding() done, defined at client.py:10> exception=AssertionError()>
    Traceback (most recent call last):
      File "client.py", line 12, in flooding
        await writer.drain()
      File "/usr/local/lib/python3.6/asyncio/streams.py", line 333, in drain
        yield from self._protocol._drain_helper()
      File "/usr/local/lib/python3.6/asyncio/streams.py", line 208, in _drain_helper
        assert waiter is None or waiter.cancelled()
    AssertionError

    I don't know much about how the drain function is working or how networking is handled by the OS, but I'm assuming that I'v reached some OS limitation which trigger this AssertionError.

    I'm not sure how I'm supposed to handle that. Am I supposed to add some throttling because I should not send too much data concurrently? Is this considered as a bug? Any explanations are welcome.

    Here some minimal client and server exemples if you want to try to reproduce it:

    Also, I don't think this is limited to python 3.6, I'v found this old issue on the aaugustin's websockets repo which looks the same: python-websockets/websockets#16

    @metathink metathink mannequin added topic-asyncio type-bug An unexpected behavior, bug, or error labels Mar 28, 2017
    @vstinner
    Copy link
    Member

    Modified client and server to be able to reproduce the issue on a LAN.

    @vstinner
    Copy link
    Member

    The bug occurs when the transport pauses writing and the client code calls drain() multiple times in parallel.

    @vstinner
    Copy link
    Member

    I understood that:

    • code fills the write buffer of the transport until writing is paused because of the high water mark
    • a function calls drain() which waits until the server reads until packets to reduce the size of the write buffer
    • a second function calls drain(), but the first function is already waiting on drain(): bug occurs since the code doesn't support having two coroutines waiting on drain() in parallel

    Notes:

    --

    Metathink told me that he got the bug on a much more complex code using websockets. Thank you Metathink for isolating the bug to a few lines of Python code with simpler asyncio functions!

    @vstinner
    Copy link
    Member

    Proof-of-concept of patched drain() to support multiple waiters.

    I don't see any strong reason to not allow two coroutines to wait on drain() in parallel?

    I'm too lazy to write a full patch with unit tests, doc changed, etc. I started with a PoC to discuss the solution.

    @vstinner vstinner changed the title asyncio.StreamWriter.drain raises an AssertionError under heavy use Waiting for asyncio.StreamWriter.drain() twice in parallel raises an AssertionError when the transport stopped writing Mar 28, 2017
    @aymericaugustin
    Copy link
    Mannequin

    aymericaugustin mannequin commented Mar 28, 2017

    For context, websockets calls yield from self.writer.drain() after each write in order to provide backpressure.

    If the output buffer fills up, calling API coroutines that write to the websocket connection becomes slow and hopefully the backpressure will propagate (in a correctly implemented application).

    This is a straightforward application of the only use case described in the documentation.

    ----

    I would find it annoying to have to serialize calls to drain() myself. It doesn't feel like something the "application" should care about. (websockets is the application from asyncio's perspective.)

    I'm wondering if it could be a problem if a bunch of corountines were waiting on drain() and got released simultaneously. I don't think it would be a problem for websockets. Since my use case seems typical, there's a good chance this also applies to other apps.

    So I'm in favor of simply allowing an arbitrary number of coroutines to wait on drain() in parallel, if that's feasible.

    @RemiCardona
    Copy link
    Mannequin

    RemiCardona mannequin commented Mar 28, 2017

    Got hit by this too, maaaaany times as I'm working with 3G devices (slow write speeds).

    As for "drain()", I'd say it should work like a fence/barrier: to let you know that the bytes in the buffer up to when the call was issued have been successfully written on the socket.

    I'll see what I can cook up.

    Cheers

    @aymericaugustin
    Copy link
    Mannequin

    aymericaugustin mannequin commented Mar 29, 2017

    drain() returns when the write buffer reaches the low water mark, not when it's empty, so you don't have a guarantee that your bytes were written to the socket.

    Pause and resume calls are paired -- pause_writing() is called
    once when the buffer goes strictly over the high-water mark
    (even if subsequent writes increases the buffer size even
    more), and eventually resume_writing() is called once when the
    buffer size reaches the low-water mark.

    The low water mark defaults to 64kB and the high water mark to 256kB.

    high = 64*1024

    With websockets, the recommended way to ensure your message was received is:

    yield from ws.send(...)
    yield from ws.ping()

    Given that TCP guarantees ordering, the ping response can only be received after the previous message was fully sent and received.

    Of course the ping can fail even though the message was received, that's the classical at-most-once vs. at-least-once question.

    The technique you suggest requires setting the low and high water marks to 0. I'm not sure this is the best way to achieve your goals, since you still don't control the OS buffers.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Jul 24, 2017

    Triggered almost the same error. Minimal proof:

    Documentation did not say that .drain() can't be called simultaneously.

    ===================================

    async def do_nothing(client_reader, client_writer):
        await asyncio.sleep(10000)
    
    mb = b'*' * (4096*4)
    async def write_cmd(writer):
        writer.write(mb)
        await writer.drain()
    
    async def amain():
        srv = await asyncio.start_unix_server(do_nothing, b'\x00qwe')
        (reader, writer) = await asyncio.open_unix_connection(b'\x00qwe')
        await asyncio.gather(*(write_cmd(writer) for i in range(200)))
    
    loop = asyncio.get_event_loop()
    loop.run_until_complete(amain())

    ===================================

    @aymericaugustin
    Copy link
    Mannequin

    aymericaugustin mannequin commented Sep 9, 2017

    I worked around this bug in websockets by serializing access to drain() with a lock:

    python-websockets/websockets@198b715

    I suspect this is inefficient but it beats crashing.

    @callumquick
    Copy link
    Mannequin

    callumquick mannequin commented Nov 4, 2019

    Hi, I'm a new contributor: is there any consensus on what or if something needs to be done? If so, I can try and take forward the patch.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kumaraditya303 kumaraditya303 added 3.11 only security fixes 3.10 only security fixes 3.12 only security fixes labels Jul 9, 2022
    @kumaraditya303 kumaraditya303 self-assigned this Jul 9, 2022
    @ezio-melotti ezio-melotti moved this to Todo in asyncio Jul 17, 2022
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 29, 2022
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 29, 2022
    pablogsal pushed a commit that referenced this issue Aug 30, 2022
    …4705) (#96395)
    
    (cherry picked from commit e5b2453)
    
    Co-authored-by: Kumar Aditya <[email protected]>
    
    Co-authored-by: Kumar Aditya <[email protected]>
    miss-islington added a commit that referenced this issue Sep 8, 2022
    @kumaraditya303
    Copy link
    Contributor

    Fixed by #94705 and backported to 3.11 and 3.10.

    Repository owner moved this from Todo to Done in asyncio Sep 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes 3.12 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    No branches or pull requests

    2 participants