-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-131675: mimalloc: fix mi_atomic_yield on 32-bit ARM #131784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, `mi_atomic_yield` for 32-bit ARM: * Used a non-standard __ARM_ARCH__ macro to determine if the compiler was targeting ARMv7+ in order to emit a `yield` instruction, however it was often not defined so fell through to an `__armel__` branch * Had a logic gap in the #ifdef where an `__arm__` target would be expected to emit `mi_atomic_yield` but there was no condition to define a function for big-endian targets Now, the standard ACLE __ARM_ARCH macro [1] [2] is used which GCC and Clang support. The branching logic for `__armel__` has been removed so if the target architecture supports v7+ instructions, a `yield` is emitted, otherwise a `nop` is emitted. This covers both big and little endian scenarios. [1]: https://arm-software.github.io/acle/main/acle.html#arm-architecture-level [2]: https://arm-software.github.io/acle/main/acle.html#mapping-of-object-build-attributes-to-predefines Signed-off-by: Vincent Fazio <[email protected]>
@colesbury FYI |
I tested this on Python 3.13 builds via Buildroot which is where the original issue came from. The builds now work (I tested 32bit big-endian ARMv7+ and ARMv5+ builds). Note that I did not perform functional testing. Upstream mimalloc has a similar fix, but it does not fix issues with big-endian builds where I may make a a PR for that repository so nothing gets lost the next time CPython syncs the library internally. |
🤖 New build scheduled with the buildbot fleet by @colesbury for commit 7302058 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F131784%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
🤞 The RPi build seemed to pass, so that's a good sign. |
Can we use the same change as mimalloc upstream: microsoft/mimalloc@632eab9 ? |
We can, but it does not cover the situation where big endian armv6/5/4 is being targeted. |
I mentioned this here microsoft/mimalloc#1046 (comment) I'd prefer to not have that gap. I think the fix here is more complete. If that means I need to submit this change up stream, I will |
Just as an example, for Buildroot, this build would continue to fail otherwise https://autobuild.buildroot.org/results/c0b592969b67e0a6454ab0714ca9540d7bae498d/ I can obviously create a patch for us that does more than what we're willing to do here, it's just an additional maintenance burden. |
if we use this, arm big endian will revert to the default implementation I also prefer this patch as it groups together all the Arm variants together, it has a better coverage and it is easier to reason about. With what whatever solution go, we should keep it in sync with mimalloc repo. |
It's a little worse than that. I tested a mimalloc build with an armeb toolchain targeting armv6 and it fails to link because the function body isn't emitted. The way it's structured is:
armeb is only handled if it's also armv7+
I'll make a PR for upstream and see if they accept it |
I've created microsoft/mimalloc#1050 in case either of you would like to chime in there |
The upstream commit was accepted. I've updated this branch to sync changes with what's in upstream for |
@colesbury @diegorusso you guys have been super responsive, I really appreciate that. Not all of my PRs here get feedback so quickly. |
Thanks for the PR and for the one on the mimalloc project as well. LGTM. I leave to @colesbury to approve/merge. |
Awesome, thanks! LMK if there's anything I can help with here. |
Thanks @vfazio! |
No problem! @colesbury Out of curiosity, what is the best way to get a reviewer assigned for future (or outstanding) PRs? Is there a place to post for help? |
Thanks @vfazio for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
…hongh-131784) Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang. The branching logic for of `__ARMEL__` has been removed so if the target architecture supports v7+ instructions, a yield is emitted, otherwise a nop is emitted. This covers both big and little endian scenarios. (cherry picked from commit 03f6c8e) Co-authored-by: Vincent Fazio <[email protected]> Signed-off-by: Vincent Fazio <[email protected]>
GH-131954 is a backport of this pull request to the 3.13 branch. |
…-131784) (gh-131954) Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang. The branching logic for of `__ARMEL__` has been removed so if the target architecture supports v7+ instructions, a yield is emitted, otherwise a nop is emitted. This covers both big and little endian scenarios. (cherry picked from commit 03f6c8e) Signed-off-by: Vincent Fazio <[email protected]> Co-authored-by: Vincent Fazio <[email protected]>
You can post in the discussion forums, but ymmv depending on the PR. |
…hongh-131784) Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang. The branching logic for of `__ARMEL__` has been removed so if the target architecture supports v7+ instructions, a yield is emitted, otherwise a nop is emitted. This covers both big and little endian scenarios. Signed-off-by: Vincent Fazio <[email protected]>
Previously,
mi_atomic_yield
for 32-bit ARM:Used a non-standard
__ARM_ARCH__
macro to determine if the compiler was targeting ARMv7+ in order to emit ayield
instruction, however it was often not defined so fell through to an__ARMEL__
branchHad a logic gap in the #ifdef where an
__arm__
target would be expected to emitmi_atomic_yield
but there was no condition to define a function for big-endian targetsNow, the standard ACLE __ARM_ARCH macro 1 2 is used which GCC and Clang support.
The branching logic for
__ARMEL__
has been removed so if the target architecture supports v7+ instructions, ayield
is emitted, otherwise anop
is emitted. This covers both big and little endian scenarios.