-
-
Notifications
You must be signed in to change notification settings - Fork 32k
locale.normalize() and getdefaultlocale() convert C.UTF-8 to en_US.UTF-8 #74940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have a system where the default locale is C.UTF-8, and en_US.UTF-8 is But locale.normalize() unhelpfully converts "C.UTF-8" to "en_US.UTF-8". So the following crashes for me: python3.6 -c "import locale;locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))" Similarly getdefaultlocale() returns ('en_US', 'UTF-8'), so this crashes too: export LANG=C.UTF-8 This behaviour is caused by a locale_alias entry in Lib/locale.py . https://bugs.python.org/issue20076 documents its addition but doesn't I can see that it might be helpful to provide such a conversion if |
I'm honestly not sure how our Python level locale handling really works (I've mainly worked on the lower level C locale manipulation), so adding folks to the nosy list based on bpo-20076 and bpo-29571. I agree we shouldn't be aliasing C.UTF-8 to en_US.UTF-8 though - we took en_US.UTF-8 out of the locale coercion fallback list in PEP-538 because it wasn't really right. |
I've investigated a bit more. First, I've tried with Python 3.7.0a1 . As you'd expect, PEP-537 means Second, I've looked through locale.py a bit. I believe what it calls the
This leads to some rather odd results. With 3.7.0a1 and no locale environment variables: >>> import locale
>>> locale.getlocale()
('en_US', 'UTF-8')
# getlocale() is lying: the effective locale is really C.UTF-8
>>> sorted("abcABC", key=locale.strxfrm)
['A', 'B', 'C', 'a', 'b', 'c'] Third, I've checked on a system which does have en_US.UTF-8 installed, >>> import locale
>>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))
'en_US.UTF-8'
>>> locale.getlocale()
('en_US', 'UTF-8')
# now getlocale() is telling the truth, and the user isn't getting the
# collation they requested
>>> sorted("abcABC", key=locale.strxfrm)
['a', 'A', 'b', 'B', 'c', 'C'] |
That can't happen. The "C" locale describes the behavior defined in the ISO C standard. It's built-in to glibc (and should be for all other libc implementations). All other locales require external support (i.e. /usr/lib/locale/<locale>) https://www.gnu.org/software/libc/manual/html_node/Standard-Locales.html#Standard-Locales |
What can we do about reverting that change? Python's current behavior causes unexpected exceptions, especially in containers. I'm currently debugging test failures in a Python application that occur in Fedora rawhide containers. Those containers don't have any locales installed. The test software saves its current locale, changes the locale in order to run a test, and then restores the original. Because Python is incorrectly reporting the original locale as "en_US", restoring the original fails. |
It certainly can. Take for example RHEL 7 or 6. |
As an example, let's consider dnf's i18n setup:
If setting the environment-specified locale fails, dnf will attempt to set the locale Unfortunately, because of the alias, this process will be unable to set the 'C.UTF-8' |
… installed (GH-14925) This change removes the alias of the 'C' locale to 'en_US'. Because of this alias, it is currently impossible for an application to use setlocale() to specify a UTF-8 locale on a system that has no locales installed, but which supports the C.UTF-8 locale/encoding.
…ocales installed (pythonGH-14925) This change removes the alias of the 'C' locale to 'en_US'. Because of this alias, it is currently impossible for an application to use setlocale() to specify a UTF-8 locale on a system that has no locales installed, but which supports the C.UTF-8 locale/encoding.
The C locale no longer does what we need in Python 3.12, see python/cpython#74940
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: