Skip to content

_mm256_loadu_si256 is failed to be inlined for ABI issues #140

@usamoi

Description

@usamoi

Inspired by rust-lang/rust#121960, I'm looking for SIMD intrinsics that are not inlined in generated code.

https://github.com/BurntSushi/aho-corasick/blob/master/src/packed/vector.rs#L19C1-L27C59:

/// # Safety
///
/// All methods are not safe since they are intended to be implemented using
/// vendor intrinsics, which are also not safe. Callers must ensure that
/// the appropriate target features are enabled in the calling function,
/// and that the current CPU supports them. All implementations should
/// avoid marking the routines with `#[target_feature]` and instead mark
/// them as `#[inline(always)]` to ensure they get appropriately inlined.
/// (`inline(always)` cannot be used with target_feature.)

It's not fully true: if you do not mark the routines with #[target_feature], LLVM will reject to inline them since it does not know if inlining causes ABI issues. So we need to use both #[target_feature] and #[inline(always)].

I find _mm256_loadu_si256 is failed to be inlined in my project and it also applies to the released cargo binary. I think it's another rustc bug at first but finally objdump leads me here.

Step to reproduce it:
Copy & paste the example in readme.

objdump ./target/release/play_rust -D --demangle | grep "core_arch"
    e0f1:       e8 7a fd 06 00          call   7de70 <core::core_arch::x86::xsave::_xgetbv>
0000000000029f70 <core::ptr::drop_in_place<&aho_corasick::packed::teddy::generic::Mask<core::core_arch::x86::__m128i>>>:
000000000002eba0 <core::ptr::drop_in_place<core::core_arch::x86::__m128i>>:
000000000002ebb0 <core::ptr::drop_in_place<core::core_arch::x86::__m256i>>:
000000000002f0d0 <<core::core_arch::x86::__m128i as core::fmt::Debug>::fmt>:
000000000002f120 <<core::core_arch::x86::__m256i as core::fmt::Debug>::fmt>:
0000000000031810 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,1_usize>>>:
0000000000031820 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,2_usize>>>:
0000000000031830 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,3_usize>>>:
0000000000031840 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,4_usize>>>:
   48651:       e8 7a 06 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48678:       e8 53 06 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   486c6:       e8 05 06 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   486ea:       e8 e1 05 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48744:       e8 87 05 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48762:       e8 69 05 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48861:       e8 6a 04 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48888:       e8 43 04 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   488d4:       e8 f7 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   488f2:       e8 d9 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   489a5:       e8 26 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   489c3:       e8 08 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48a51:       e8 7a 02 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48a78:       e8 53 02 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48ac6:       e8 05 02 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48aea:       e8 e1 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48b44:       e8 87 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48b68:       e8 63 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48bc2:       e8 09 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
   48be0:       e8 eb 00 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
0000000000048cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>:
000000000007de70 <core::core_arch::x86::xsave::_xgetbv>:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions