Skip to content

HttpUtils method urlEncodeFormParams Charset #1444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
keyhunter opened this issue Jul 31, 2017 · 4 comments
Closed

HttpUtils method urlEncodeFormParams Charset #1444

keyhunter opened this issue Jul 31, 2017 · 4 comments

Comments

@keyhunter
Copy link

keyhunter commented Jul 31, 2017

When the charset is not utf-8 , this code (org.asynchttpclient.util.HttpUtils) will produce the wrong result

public static ByteBuffer urlEncodeFormParams(List<Param> params, Charset charset) { return StringUtils.charSequence2ByteBuffer(urlEncodeFormParams0(params), charset); }

This Code specifies the utf-8 encoding, and can't change, this will result in errors under other encodings(GBK etc).

@slandelle
Copy link
Contributor

@keyhunter Could you provide a failing test case please?

@keyhunter
Copy link
Author

keyhunter commented Jul 31, 2017

Of course, @slandelle .
`
String beEncodeValue = "中文";

    List<Param> params = new ArrayList<>();

    params.add(new Param("language", beEncodeValue));

    ByteBuffer result1 = HttpUtils.urlEncodeFormParams(params, Charset.forName("GBK"));

    ByteBuffer result2 = HttpUtils.urlEncodeFormParams(params, Charset.forName("UTF-8"));

`

They all invoke the method urlEncodeFormParams0, the method use Utf8UrlEncoder, so produced the same result.

But the real result is:
URLEncoder.encode(beEncodeValue, "GBK");// should be "%D6%D0%CE%C4"; URLEncoder.encode(beEncodeValue, "UTF-8");//should be "%E4%B8%AD%E6%96%87"
They are different.

@slandelle
Copy link
Contributor

@keyhunter Should be fixed (even though there's room for perf improvement). Could you please check on your side?

slandelle added a commit that referenced this issue Jul 31, 2017
Motivation:

form urlencoding doesn’t properly honor charset. It uses it for
converting the bytes while those are supposed to be already in the
US-ASCII range. It should be using it the first encode into bytes,
which should be then escaped.

Modifications:

Use current optimized code for UTF-8 and fall back to URLEncoder for
other charsets.

Results:

Proper encoding when charset is different from UTF-8, eg GBK
@keyhunter
Copy link
Author

OK, thanks.
👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants