-
Notifications
You must be signed in to change notification settings - Fork 108
Open
Description
Inspired by rust-lang/rust#121960, I'm looking for SIMD intrinsics that are not inlined in generated code.
https://github.com/BurntSushi/aho-corasick/blob/master/src/packed/vector.rs#L19C1-L27C59:
/// # Safety
///
/// All methods are not safe since they are intended to be implemented using
/// vendor intrinsics, which are also not safe. Callers must ensure that
/// the appropriate target features are enabled in the calling function,
/// and that the current CPU supports them. All implementations should
/// avoid marking the routines with `#[target_feature]` and instead mark
/// them as `#[inline(always)]` to ensure they get appropriately inlined.
/// (`inline(always)` cannot be used with target_feature.)
It's not fully true: if you do not mark the routines with #[target_feature], LLVM will reject to inline them since it does not know if inlining causes ABI issues. So we need to use both #[target_feature] and #[inline(always)].
I find _mm256_loadu_si256 is failed to be inlined in my project and it also applies to the released cargo binary. I think it's another rustc bug at first but finally objdump leads me here.
Step to reproduce it:
Copy & paste the example in readme.
objdump ./target/release/play_rust -D --demangle | grep "core_arch"
e0f1: e8 7a fd 06 00 call 7de70 <core::core_arch::x86::xsave::_xgetbv>
0000000000029f70 <core::ptr::drop_in_place<&aho_corasick::packed::teddy::generic::Mask<core::core_arch::x86::__m128i>>>:
000000000002eba0 <core::ptr::drop_in_place<core::core_arch::x86::__m128i>>:
000000000002ebb0 <core::ptr::drop_in_place<core::core_arch::x86::__m256i>>:
000000000002f0d0 <<core::core_arch::x86::__m128i as core::fmt::Debug>::fmt>:
000000000002f120 <<core::core_arch::x86::__m256i as core::fmt::Debug>::fmt>:
0000000000031810 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,1_usize>>>:
0000000000031820 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,2_usize>>>:
0000000000031830 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,3_usize>>>:
0000000000031840 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,4_usize>>>:
48651: e8 7a 06 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48678: e8 53 06 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
486c6: e8 05 06 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
486ea: e8 e1 05 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48744: e8 87 05 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48762: e8 69 05 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48861: e8 6a 04 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48888: e8 43 04 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
488d4: e8 f7 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
488f2: e8 d9 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
489a5: e8 26 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
489c3: e8 08 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48a51: e8 7a 02 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48a78: e8 53 02 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48ac6: e8 05 02 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48aea: e8 e1 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48b44: e8 87 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48b68: e8 63 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48bc2: e8 09 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
48be0: e8 eb 00 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>
0000000000048cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>:
000000000007de70 <core::core_arch::x86::xsave::_xgetbv>:
Metadata
Metadata
Assignees
Labels
No labels