Re: Potential RFC: mb_rawurlencode() ?

From: Date: Sat, 22 Mar 2025 13:43:11 +0000
Subject: Re: Potential RFC: mb_rawurlencode() ?
References: 1 2  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi Tim & all,

> On Mar 21, 2025, at 06:22, Tim Düsterhus <[email protected]> wrote:
> 
> Am 2025-03-18 18:48, schrieb Paul M. Jones:
>> $iriPath = '/heads/' . rawurlencode($val) . '/tails/');
>> assert($iriPath === '/heads/fü bar/tails/'; // false
> 
> From my reading of RFC 3987 that result is incorrect. The space is neither listed as
> iunreserved, not as sub-delims, thus isn't a valid
> ipchar. Thus the space needs to be encoded as %20 for IRIs as well. The same mistake
> applies to the reference userland implementation below.

Agreed; the naive implementation would need to less naive and pay closer attention to the ABNF for
ucschar and ipchar in the spec.

Along those lines, I think there might need to be two additional changes/additions to help with
encoding for RFC 3987 and WHATWG-URL component values:

- http_build_query() would need PHP_QUERY_3987 and PHP_QUERY_WHATWG flags and
corresponding logic (or entirely new functions); and
- parse_str() would need a corresponding mb_parse_str().


-- pmj


Thread (3 messages)

« previous php.internals (#126905) next »