-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Move utf8_encode and utf8_decode to ext/standard #2160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move utf8_encode and utf8_decode to ext/standard #2160
Conversation
looks good! 👍 |
zend_string *str; | ||
unsigned char c; | ||
|
||
str = zend_string_safe_alloc(len, 4, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 2 should be enough here actually, since from iso-8859-1 to UTF-8 you can't get more than 2 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. The original function was (needlessly!) more generic and supported encoding 3- and 4-byte characters, which I stripped out, but I missed this.
When I copied across the internal encode/decode functions to |
On that line of thought: though it's trivial, there's now at least four places in the PHP source where UTF-8 generation is done: the lexer ( (It's interesting that PHP core has a bunch of multibyte stuff in |
Can I suggest that this PR stays focused on just moving these string functions to a sensible place, and try to make it "nice" in another PR that addresses the duplication issue ? |
Yes, that's fair. |
@yohgaki Do you have any thoughts on this? |
It seems these functions aimed to support various encodings when it is implemented. It will not happen, I guess. It may be better soft deprecate them. However, I don't have strong opinion on this. |
I don't think an RFC is required here, would like to see this merged. Am I wrong about that, do we need an RFC ? |
Well, it doesn't break backwards-compatibility, and it doesn't even introduce a new feature, it just moves something from one extension to another. It's not even a “bug fix” really. So… I don't think it requires an RFC? I could just merge this right away if there's no objections. |
Another change that maybe should be done is renaming the functions and making these names be aliases, because |
I think it's okay to merge the patch as it is, just to move the functions out of xml. I think if you wanted to tidy up some stuff and create aliases and remove duplication, then that needs an RFC. I'm happy for this to be merged into master whatever, I'd wait for someone else to +1 that before doing it. Please remember a news and upgrading entry when you merge. |
+1 |
9332d5e
to
1a512ee
Compare
Merged into master, so this will be in PHP 7.2. |
These are generic string functions which have a use outside of processing XML, and don't have any dependency on libxml. Therefore, this patch moves them to
ext/standard
.This patch doesn't touch
NEWS
andUPGRADING
currently. I can do those and merge this myself, if it's approved.