Skip to content

Update mbstring parameter names #6207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

nikic
Copy link
Member

@nikic nikic commented Sep 24, 2020

The mb_str_split change is applied to str_split() as well.

The mb_ereg functions might need another look when we update pcre, as there are some parallels there.

function mb_ereg_replace(string $pattern, string $replacement, string $string, ?string $option = null): string|false|null {}
function mb_ereg_replace(string $pattern, string $replacement, string $string, ?string $options = null): string|false|null {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The similarity with the pcre extension was already mentioned. In that context I would probably rename $option to $modifiers.

However that would also mean the documentation text needs updating. Which... would not be a bad idea anyways. It speaks about "options" and then lists all possible "options" which also include "modes". So options can also contain a mode. Confusing.

https://www.php.net/manual/en/function.mb-regex-set-options.php

Maybe it should speak about "behavior modifiers" and "syntax modifiers" in stead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, modifiers is a possible alternative term. However, the referenced function is called mb_regex_set_options(), so that seems like the preferred term for mb_regex. (PCRE also never exposes "modifiers" in parameters, as they are part of the regex -- as such, there is no naming consistency to upkeep in that respect at least.)

function mb_convert_variables(string $to, array|string $from, mixed &$var, mixed &...$vars): string|false {}
function mb_convert_variables(string $to_encoding, array|string $from_encoding, mixed &$var, mixed &...$vars): string|false {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

$from / $to -> $decode / $encode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the suggestion. Are you saying the parameter names should be $encode and $decode or $encode_encoding and $decode_encoding? Both don't really make sense to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was to name them $encode and $decode. My idea was that $decode says how to decode the string while $encode says how to encode it again. It made me think of how it would be written in Python:

my_string.decode('iso-8859-1').encode('utf8')

The names are short too. To me it does make sense, but that could just be me. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to leave these at just $to and $from if the _encoding suffix makes them too long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I like their current names ^^ But unfortunately I can only have a closer look a bit later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to mention a few other options: UConverter::__construct() uses $source_encoding and $destination_encoding ($target_encoding reads better for me though).

Copy link
Member

@kocsismate kocsismate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other ideas:

  • mb_http_input(): what about using the $type_code param name? Although, $type also works for me.
  • mb_parse_str(): would just $string be clear enough?
  • mb_output_handler(): would the $string param name make sense?
  • mb_substr(): it uses $start while the other similar params are called $offset. I'm not sure though which name should be preferred. Maybe $start is a tiny little bit more clear.
  • mb_strimwidth: Could we still use $length instead of $with, in spite of the function name? I'm ok either way though, just wanted to highlight this inconsistency compared to the other param names :)
  • mb_detect_encoding(): what do you think about $encodings instead of $encoding_list? I hardly ever saw the latter construct in use in php-src.
  • mb_ereg*(): would $matches work instead of $registers, similar to what ext/pcre uses?

@@ -65,29 +65,29 @@ function mb_list_encodings(): array {}

function mb_encoding_aliases(string $encoding): array {}

function mb_encode_mimeheader(string $string, ?string $charset = null, ?string $transfer = null, string $linefeed = "\r\n", int $indent = 0): string {}
function mb_encode_mimeheader(string $string, ?string $charset = null, ?string $transfer_encoding = null, string $linefeed = "\r\n", int $indent = 0): string {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synonyms for "line feed" according to Wikipedia: new line, line ending, end of line, or line break. I think a few of them would qualify as a better (more well-known) name. E.g. $newline? If you don't agree, can you use $line_feed? As far as I saw it should be spelled as two words.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with $newline here, agree that line feed is uncommon.

function mb_convert_variables(string $to, array|string $from, mixed &$var, mixed &...$vars): string|false {}
function mb_convert_variables(string $to_encoding, array|string $from_encoding, mixed &$var, mixed &...$vars): string|false {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to mention a few other options: UConverter::__construct() uses $source_encoding and $destination_encoding ($target_encoding reads better for me though).


function mb_encode_numericentity(string $string, array $convmap, ?string $encoding = null, bool $is_hex = false): string {}
function mb_encode_numericentity(string $string, array $map, ?string $encoding = null, bool $use_hex = false): string {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... So should we use the use_/return_ prefix in case of hash_hmac() et al.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to just $hex to be consistent with what we did for other bool parameters. In this case, there doesn't seem to be a potential to misinterpret this is a non-boolean parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also had some doubts about the use_ name in the ext/standard PR, so 👍


function mb_decode_numericentity(string $string, array $convmap, ?string $encoding = null): string {}
function mb_decode_numericentity(string $string, array $map, ?string $encoding = null): string {}

function mb_send_mail(string $to, string $subject, string $message, array|string $additional_headers = [], ?string $additional_parameters = null): bool {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT?

Suggested change
function mb_send_mail(string $to, string $subject, string $message, array|string $additional_headers = [], ?string $additional_parameters = null): bool {}
function mb_send_mail(string $to, string $subject, string $message, array|string $additional_headers = [], ?string $additional_params = null): bool {}

Although, as I wrote in the PDO rename PR, we could/should set up some rules for using $param and $arg.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nikic
Copy link
Member Author

nikic commented Sep 27, 2020

A few other ideas:

* `mb_http_input()`: what about using the `$type_code` param name? Although, `$type` also works for me.

I slightly prefer just $type here, because $type_code sounds like an integer to me.

* `mb_parse_str()`: would just `$string` be clear enough?

Yeah, done.

* `mb_output_handler()`: would the `$string` param name make sense?

I think so, done. The https://www.php.net/ob_start docs call this $buffer, but $string seems fine as well.

* `mb_substr()`: it uses `$start` while the other similar params are called `$offset`. I'm not sure though which name should be preferred. Maybe `$start` is a tiny little bit more clear.

This is a tricky one to which I don't have an answer. I think that generally the $start+$length terminology is good, but with the strpos family, we get the added problem that we also have strrpos, in which case $offset can also be the end of the search (if negative).

* `mb_strimwidth`: Could we still use `$length` instead of `$with`, in spite of the function name? I'm ok either way though, just wanted to highlight this inconsistency compared to the other param names :)

I think I'd prefer to keep $width here, the distinction seems sufficiently important.

* `mb_detect_encoding()`: what do you think about `$encodings` instead of `$encoding_list`? I hardly ever saw the latter construct in use in php-src.

Agreed, done.

* `mb_ereg*()`: would `$matches` work instead of `$registers`, similar to what ext/pcre uses?

I initially kept this because the "registers" terminology is part of the mb_ereg_search_getregs() API name. However, I agree that $matches is the de-facto standard for this. ext/pcre stubs currently use $subpattern, we'll want to change it there as well.

@kocsismate
Copy link
Member

LGTM! :)

@php-pulls php-pulls closed this in cafceea Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants