Skip to content

SIMD bitwise logical ops #81924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

stephentyrone
Copy link
Contributor

No description provided.

Also removed concrete SIMDMask(lowHalf:highHalf:) init, as there is no corresponding generic operation; it was added in error.
There's no reason for these to ever be calls, so they should be transparent instead of just aEIC. Also adds concrete versions of comparisons with scalars, and filecheck tests to make sure these generate 1-2 instruction sequences in release on arm64 (x86_64 is a little trickier to test due to frame pointers, but if we get the right codgen on arm64, in practice we do well on x86_64 for these too). Also makes filecheck patterns for repeating initializers a bit more robust.
This instruction was added in SSE4.1, which is in the baseline that macOS targets now, but not in the baseline for older macOS targets, nor for Linux or Windows. Eventually, we'll want to be able to slice these tests more finely by ISA extension targeted, but the arm64 coverage gets us what we need here.
unsigned vector compares on x86 are pretty fragile (because it doesn't have them pre-AVX-whatever) and it turns out we generate different things for simulator and macOS and linux, so rather than trying to fit all of those into filecheck patterns, we just rely on the arm64 checks instead (the code itself is all generic, so this is not so unreasonable).
Prior to this change, the following simple code:
```
func foo(_ a: SIMD8<Int16>) -> SIMD8<Int16> {
  a.replacing(with: SIMD8.zero &- a, where: a .< 0)
}
```
generated 28 instructions with optimization enabled. Now, at Onone we get:
```
0000000100000468	sub	sp, sp, #0x10
000000010000046c	mov.16b	v2, v0
0000000100000470	str	xzr, [sp]
0000000100000474	str	xzr, [sp, #0x8]
0000000100000478	str	q2, [sp]
000000010000047c	neg.8h	v1, v2
0000000100000480	cmlt.8h	v0, v2, #0
0000000100000484	bsl.16b	v0, v1, v2
0000000100000488	add	sp, sp, #0x10
000000010000048c	ret
```
and at -O we get:
```
0000000100000418	abs.8h	v0, v0
000000010000041c	ret
```
@stephentyrone stephentyrone requested a review from a team as a code owner June 3, 2025 02:54
@stephentyrone
Copy link
Contributor Author

@swift-ci test

@stephentyrone
Copy link
Contributor Author

@swift-ci test compiler performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant