Skip to content

Refactor multipart decoding implementation and add new API #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

yawkat
Copy link
Contributor

@yawkat yawkat commented Nov 29, 2024

This PR adds a new lower-level post body parsing API (PostBodyDecoder). Right now the only implementation is the MultipartDecoder because that's the difficult one, but I plan to implement URL decoding too.

The new API is purely stream-based and leaves the data management (e.g. writing to disk) to the caller. This makes it more compatible with netty 5 buffers, makes buffer lifecycles easier to understand to avoid buffer leaks, and simplifies the implementation to allow for future optimization.

Because multipart is incredibly complex and the old decoder is the result of years of issue reports for compatibility with obscure clients, and I don't have confidence in the existing test coverage, my main focus in this PR has been validating compatibility. To do this, I've ported HttpPostMultipartRequestDecoder to use the new low-level API internally, and created a fuzz test (MultipartComparisonTest) that compares the output with that of the old decoder (HttpPostMultipartRequestDecoderLegacy). This gives me high confidence that there is no feature regression.

In this process, I've found a few behaviors of the old decoder that I consider bugs not worth keeping around. Nonetheless, I've decided to keep HttpPostMultipartRequestDecoder behaving the same way (also to make the fuzzing work), by introducing a "quirk mode" for the streaming decoder. This quirk mode reproduces the small oddities of the original decoder during parsing.

This PR is against netty 5, but if it goes well, I plan to backport the streaming parser to netty 4. My suggestion is to keep the backport in netty-contrib, release a netty 4 version of codec-multipart, and eventually deprecate the form decoder in the netty 4 core repo.

This PR adds a new lower-level post body parsing API (PostBodyDecoder). Right now the only implementation is the MultipartDecoder because that's the difficult one, but I plan to implement URL decoding too.

The new API is purely stream-based and leaves the data management (e.g. writing to disk) to the caller. This makes it more compatible with netty 5 buffers, makes buffer lifecycles easier to understand to avoid buffer leaks, and simplifies the implementation to allow for future optimization.

Because multipart is incredibly complex and the old decoder is the result of years of issue reports for compatibility with obscure clients, and I don't have confidence in the existing test coverage, my main focus in this PR has been validating compatibility. To do this, I've ported HttpPostMultipartRequestDecoder to use the new low-level API internally, and created a fuzz test (MultipartComparisonTest) that compares the output with that of the old decoder (HttpPostMultipartRequestDecoderLegacy). This gives me high confidence that there is no feature regression.

In this process, I've found a few behaviors of the old decoder that I consider bugs not worth keeping around. Nonetheless, I've decided to keep HttpPostMultipartRequestDecoder behaving the same way (also to make the fuzzing work), by introducing a "quirk mode" for the streaming decoder. This quirk mode reproduces the small oddities of the original decoder during parsing.

This PR is against netty 5, but if it goes well, I plan to backport the streaming parser to netty 4. My suggestion is to keep the backport in netty-contrib, release a netty 4 version of codec-multipart, and eventually deprecate the form decoder in the netty 4 core repo.
@yawkat
Copy link
Contributor Author

yawkat commented Nov 29, 2024

@violetagg

@normanmaurer @trustin

WDYT?

@yawkat
Copy link
Contributor Author

yawkat commented Mar 3, 2025

I've added a application/x-www-form-urlencoded implementation, and migrated the old HttpPostStandardRequestDecoder to it. That means the new decoder API can now act as a full replacement for the InterfaceHttpPostRequestDecoders.

@altro3
Copy link

altro3 commented Apr 7, 2025

@violetagg @normanmaurer @trustin Could you see to this?

@loicmathieu
Copy link

Hi,
This PR fixes #12729 which we are affected by.
Any progress on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants