Skip to content

[RFC] Brainstorming generic async write support for advanced streams #3396

Closed
@pfalcon

Description

@pfalcon

When asked why on Earth write operations in asyncio are usual functions instead of coroutines, Guido van Rossum answered something like "There're a big difference between read and write operations". Yeah, something like that. Let's compare for example read on a websock or SSL connection: to decode data, it needs a complete underlying record, and until it has it, there's no other choice as to buffer it internally and just return None (not ready) from .read().

But with .write(), following situation is normal: suppose .write(len(100)) was called, 64 bytes of that input was processed, and turned into 256 bytes of data which needs to be written to the underlying stream. Then 128 bytes of that was written to it before EAGAIN hit. So, what .write() should return? Apparently, no (single) return value can be useful, and there's no easy way for the calling code to continue the operation.

So, one idea is that .write() methods destined for async usage should not try to perform stream write op themselves (that requires a coroutine), but return following context:

[1] number of input bytes processed, pointer to buffer of output data, and its length.

One difference with sync write methods is that async apparently require much heavier buffering, or otherwise internal state machine gets very complex, or it's simply impossible to "chunk" input data for continuous operation. For example, for websock case, initial data to write is a small message header. But there's no way to output just that in [1] scheme, because no input bytes are consumed in this case, so next time client code would restart sending it again (which would lead to sending just message header, in an infinite loop). And note that for websock case, data after header might be sent as is, so artificially chunking it into multiple messages isn't ideal.

So, model [1], while already not exactly easy, might be not adequate/optimal enough.

[2] The ultimate solution would be to make async write methods to be coroutines, but we don't know how to do that for C methods, and devising fully generic scheme may take some effort.

So, some partial "coro-like" scheme may be a compromise, and effectively, [1] can be seen as a first iteration of that idea.

[3] More elaborated idea could be: async writer is multi-stage process, it starts with setting data to write, and then its another sub-operation is called, which repeatedly returns pairs of (ptr, len) chunks to be written. That's already pretty much a coro.

[4] Yet another idea is that ultimately this stuff needs to be written into real stream implementation, and writing single bytes/short records is inefficient, so maybe each async writer should be coupled with a buffer, it should be possible to query remaining space in a buffer, and either write next chunk of data, or request that given amount space should be available (to write fixed-size record in one go). This idea goes in the direction of CPython's asyncio implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions