Skip to content

gh-131675: mimalloc: fix mi_atomic_yield on 32-bit ARM #131784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 31, 2025

Conversation

vfazio
Copy link
Contributor

@vfazio vfazio commented Mar 27, 2025

Previously, mi_atomic_yield for 32-bit ARM:

  • Used a non-standard __ARM_ARCH__ macro to determine if the compiler was targeting ARMv7+ in order to emit a yield instruction, however it was often not defined so fell through to an __ARMEL__ branch

  • Had a logic gap in the #ifdef where an __arm__ target would be expected to emit mi_atomic_yield but there was no condition to define a function for big-endian targets

Now, the standard ACLE __ARM_ARCH macro 1 2 is used which GCC and Clang support.

The branching logic for __ARMEL__ has been removed so if the target architecture supports v7+ instructions, a yield is emitted, otherwise a nop is emitted. This covers both big and little endian scenarios.

Previously, `mi_atomic_yield` for 32-bit ARM:

  * Used a non-standard __ARM_ARCH__ macro to determine if the compiler
    was targeting ARMv7+ in order to emit a `yield` instruction, however
    it was often not defined so fell through to an `__armel__` branch

  * Had a logic gap in the #ifdef where an `__arm__` target would be
    expected to emit `mi_atomic_yield` but there was no condition to
    define a function for big-endian targets

Now, the standard ACLE __ARM_ARCH macro [1] [2] is used which GCC and
Clang support.

The branching logic for `__armel__` has been removed so if the target
architecture supports v7+ instructions, a `yield` is emitted, otherwise
a `nop` is emitted. This covers both big and little endian scenarios.

[1]: https://arm-software.github.io/acle/main/acle.html#arm-architecture-level
[2]: https://arm-software.github.io/acle/main/acle.html#mapping-of-object-build-attributes-to-predefines

Signed-off-by: Vincent Fazio <[email protected]>
@vfazio
Copy link
Contributor Author

vfazio commented Mar 27, 2025

@colesbury FYI

@vfazio
Copy link
Contributor Author

vfazio commented Mar 27, 2025

I tested this on Python 3.13 builds via Buildroot which is where the original issue came from. The builds now work (I tested 32bit big-endian ARMv7+ and ARMv5+ builds).

Note that I did not perform functional testing.

Upstream mimalloc has a similar fix, but it does not fix issues with big-endian builds where __ARM_ARCH < 7.

I may make a a PR for that repository so nothing gets lost the next time CPython syncs the library internally.

@colesbury colesbury added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 27, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 7302058 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F131784%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 27, 2025
@vfazio
Copy link
Contributor Author

vfazio commented Mar 27, 2025

🤞 The RPi build seemed to pass, so that's a good sign.

@colesbury
Copy link
Contributor

Can we use the same change as mimalloc upstream: microsoft/mimalloc@632eab9 ?

@vfazio
Copy link
Contributor Author

vfazio commented Mar 27, 2025

Can we use the same change as mimalloc upstream: microsoft/mimalloc@632eab9 ?

We can, but it does not cover the situation where big endian armv6/5/4 is being targeted.

@vfazio
Copy link
Contributor Author

vfazio commented Mar 27, 2025

I mentioned this here microsoft/mimalloc#1046 (comment)

I'd prefer to not have that gap. I think the fix here is more complete.

If that means I need to submit this change up stream, I will

@vfazio
Copy link
Contributor Author

vfazio commented Mar 27, 2025

Just as an example, for Buildroot, this build would continue to fail otherwise https://autobuild.buildroot.org/results/c0b592969b67e0a6454ab0714ca9540d7bae498d/

I can obviously create a patch for us that does more than what we're willing to do here, it's just an additional maintenance burden.

@diegorusso
Copy link
Contributor

diegorusso commented Mar 27, 2025

Can we use the same change as mimalloc upstream: microsoft/mimalloc@632eab9 ?

if we use this, arm big endian will revert to the default implementation sleep(0). With this patch instead will use the nop.

I also prefer this patch as it groups together all the Arm variants together, it has a better coverage and it is easier to reason about.

With what whatever solution go, we should keep it in sync with mimalloc repo.

@vfazio
Copy link
Contributor Author

vfazio commented Mar 28, 2025

Can we use the same change as mimalloc upstream: microsoft/mimalloc@632eab9 ?

if we use this, arm big endian will revert to the default implementation sleep(0). With this patch instead will use the nop.

It's a little worse than that. I tested a mimalloc build with an armeb toolchain targeting armv6 and it fails to link because the function body isn't emitted. The way it's structured is:

if ARM or ARMEL; then
  if ARM and ARMv7+; then
    emit yield
  elif ARMEL; then
    emit nop
  fi
fi

armeb is only handled if it's also armv7+

# CC='/mnt/development/buildroot/output/per-package/python3/host/bin/armeb-buildroot-linux-gnueabi-cc -march=armv6' cmake -B buildv6 -S . && make -C buildv6

In file included from /home/vfazio/mimalloc/include/mimalloc/types.h:27,
                 from /home/vfazio/mimalloc/include/mimalloc/internal.h:17,
                 from /home/vfazio/mimalloc/src/static.c:17:
/home/vfazio/mimalloc/include/mimalloc/atomic.h:95:20: warning: ‘mi_atomic_yield’ used but never defined
   95 | static inline void mi_atomic_yield(void);
      |                    ^~~~~~~~~~~~~~~
In file included from /home/vfazio/mimalloc/src/static.c:37:
/home/vfazio/mimalloc/src/stats.c:221:13: warning: ‘mi_stat_total_print’ defined but not used [-Wunused-function]
  221 | static void mi_stat_total_print(const mi_stat_count_t* stat, const char* msg, int64_t unit, mi_output_fun* out, void* arg) {
      |             ^~~~~~~~~~~~~~~~~~~
make[2]: Leaving directory '/home/vfazio/mimalloc/buildv6'
[ 79%] Built target mimalloc-obj
make[2]: Entering directory '/home/vfazio/mimalloc/buildv6'
make[2]: Leaving directory '/home/vfazio/mimalloc/buildv6'
make[2]: Entering directory '/home/vfazio/mimalloc/buildv6'
[ 81%] Generating mimalloc.o
make[2]: Leaving directory '/home/vfazio/mimalloc/buildv6'
[ 81%] Built target mimalloc-obj-target
make[2]: Entering directory '/home/vfazio/mimalloc/buildv6'
make[2]: Leaving directory '/home/vfazio/mimalloc/buildv6'
make[2]: Entering directory '/home/vfazio/mimalloc/buildv6'
[ 84%] Building C object CMakeFiles/mimalloc-test-api.dir/test/test-api.c.o
In file included from /home/vfazio/mimalloc/include/mimalloc/types.h:27,
                 from /home/vfazio/mimalloc/test/test-api.c:37:
/home/vfazio/mimalloc/include/mimalloc/atomic.h:95:20: warning: ‘mi_atomic_yield’ declared ‘static’ but never defined [-Wunused-function]
   95 | static inline void mi_atomic_yield(void);
      |                    ^~~~~~~~~~~~~~~
[ 86%] Linking C executable mimalloc-test-api
/mnt/development/buildroot/output/per-package/python3/host/bin/../lib/gcc/armeb-buildroot-linux-gnueabi/13.3.0/../../../../armeb-buildroot-linux-gnueabi/bin/ld: libmimalloc.a(page.c.o): in function `_mi_page_queue_append':
page.c:(.text+0x5b4): undefined reference to `mi_atomic_yield'
/mnt/development/buildroot/output/per-package/python3/host/bin/../lib/gcc/armeb-buildroot-linux-gnueabi/13.3.0/../../../../armeb-buildroot-linux-gnueabi/bin/ld: page.c:(.text+0x5bc): undefined reference to `mi_atomic_yield'
/mnt/development/buildroot/output/per-package/python3/host/bin/../lib/gcc/armeb-buildroot-linux-gnueabi/13.3.0/../../../../armeb-buildroot-linux-gnueabi/bin/ld: libmimalloc.a(page.c.o): in function `_mi_page_use_delayed_free':
page.c:(.text+0x7f8): undefined reference to `mi_atomic_yield'
/mnt/development/buildroot/output/per-package/python3/host/bin/../lib/gcc/armeb-buildroot-linux-gnueabi/13.3.0/../../../../armeb-buildroot-linux-gnueabi/bin/ld: page.c:(.text+0x800): undefined reference to `mi_atomic_yield'
/mnt/development/buildroot/output/per-package/python3/host/bin/../lib/gcc/armeb-buildroot-linux-gnueabi/13.3.0/../../../../armeb-buildroot-linux-gnueabi/bin/ld: libmimalloc.a(page.c.o): in function `_mi_page_try_use_delayed_free':
page.c:(.text+0x880): undefined reference to `mi_atomic_yield'
/mnt/development/buildroot/output/per-package/python3/host/bin/../lib/gcc/armeb-buildroot-linux-gnueabi/13.3.0/../../../../armeb-buildroot-linux-gnueabi/bin/ld: libmimalloc.a(page.c.o):page.c:(.text+0xd18): more undefined references to `mi_atomic_yield' follow
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/mimalloc-test-api.dir/build.make:98: mimalloc-test-api] Error 1
make[2]: Leaving directory '/home/vfazio/mimalloc/buildv6'
make[1]: *** [CMakeFiles/Makefile2:201: CMakeFiles/mimalloc-test-api.dir/all] Error 2
make[1]: Leaving directory '/home/vfazio/mimalloc/buildv6'
make: *** [Makefile:146: all] Error 2
make: Leaving directory '/home/vfazio/mimalloc/buildv6'

I'll make a PR for upstream and see if they accept it

@vfazio
Copy link
Contributor Author

vfazio commented Mar 28, 2025

@colesbury @diegorusso

I've created microsoft/mimalloc#1050 in case either of you would like to chime in there

@vfazio
Copy link
Contributor Author

vfazio commented Mar 28, 2025

The upstream commit was accepted. I've updated this branch to sync changes with what's in upstream for mi_yield_atomic for this guard block only. Other comments or changes were not synced to limit the scope of the change to the problem called out in the PR description and linked issue.

@vfazio
Copy link
Contributor Author

vfazio commented Mar 28, 2025

@colesbury @diegorusso you guys have been super responsive, I really appreciate that. Not all of my PRs here get feedback so quickly.

@diegorusso
Copy link
Contributor

diegorusso commented Mar 31, 2025

@colesbury @diegorusso you guys have been super responsive, I really appreciate that. Not all of my PRs here get feedback so quickly.

Thanks for the PR and for the one on the mimalloc project as well. LGTM. I leave to @colesbury to approve/merge.

@vfazio
Copy link
Contributor Author

vfazio commented Mar 31, 2025

Awesome, thanks! LMK if there's anything I can help with here.

@colesbury colesbury added the needs backport to 3.13 bugs and security fixes label Mar 31, 2025
@colesbury
Copy link
Contributor

Thanks @vfazio!

@colesbury colesbury enabled auto-merge (squash) March 31, 2025 18:07
@vfazio
Copy link
Contributor Author

vfazio commented Mar 31, 2025

Thanks @vfazio!

No problem!

@colesbury Out of curiosity, what is the best way to get a reviewer assigned for future (or outstanding) PRs? Is there a place to post for help?

@colesbury colesbury merged commit 03f6c8e into python:main Mar 31, 2025
48 checks passed
@miss-islington-app
Copy link

Thanks @vfazio for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 31, 2025
…hongh-131784)

Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang.

The branching logic for of `__ARMEL__` has been removed so if the target
architecture supports v7+ instructions, a yield is emitted, otherwise a nop
is emitted. This covers both big and little endian scenarios.
(cherry picked from commit 03f6c8e)

Co-authored-by: Vincent Fazio <[email protected]>
Signed-off-by: Vincent Fazio <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Mar 31, 2025

GH-131954 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 31, 2025
colesbury pushed a commit that referenced this pull request Mar 31, 2025
…-131784) (gh-131954)

Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang.

The branching logic for of `__ARMEL__` has been removed so if the target
architecture supports v7+ instructions, a yield is emitted, otherwise a nop
is emitted. This covers both big and little endian scenarios.
(cherry picked from commit 03f6c8e)

Signed-off-by: Vincent Fazio <[email protected]>
Co-authored-by: Vincent Fazio <[email protected]>
@colesbury
Copy link
Contributor

You can post in the discussion forums, but ymmv depending on the PR.

https://discuss.python.org/c/core-dev/23

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…hongh-131784)

Use the standard `__ARM_ARCH` macro, which is supported by GCC and Clang.

The branching logic for of `__ARMEL__` has been removed so if the target
architecture supports v7+ instructions, a yield is emitted, otherwise a nop
is emitted. This covers both big and little endian scenarios.

Signed-off-by: Vincent Fazio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants