[stdlib] Add to Unsafe[Mutable]RawBufferPointer implementation of _custom[Last]IndexOfEquatableElement #63128

valeriyvan · 2023-01-20T09:17:26Z

Attempt speed up firstIndex(of:) of Unsafe[Mutable]RawBufferPointer by implementing _customIndexOfEquatableElement.

Currently benchmark of firstIndex(of:) terminates with Signals.SIGABRT: 6 and I have no idea why.

Traceback (most recent call last):
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 1009, in <module>
    exit(main())
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 1005, in main
    return args.func(args)
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 336, in run_benchmarks
    results = driver.run_and_log(csv_console=(args.output_dir is None))
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 325, in run_and_log
    result = self.run_independent_samples(test)
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 253, in run_independent_samples
    [
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 254, in <listcomp>
    self.run(test, measure_memory=True, num_iters=1)
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 206, in run
    output = self._invoke(cmd)
  File "/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_Driver", line 82, in _invoke
    return self._subprocess.check_output(
  File "/opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/Users/********/********/build/Ninja-ReleaseAssert/swift-macosx-arm64/bin/Benchmark_O-arm64-apple-macosx10.9', 'RawBuffer.10.findFirst', '--num-iters=1', '--memory']' died with <Signals.SIGABRT: 6>.
ninja: build stopped: subcommand failed.
ERROR: command terminated with a non-zero exit status 1, aborting

In separate branch #63106 this benchmark works normally.

Any help is appreciated.

glessard

The benchmark needs to be added in a PR separate from the change you intend to measure. #63106 can fulfill that requirement. Please remove the benchmark from this PR. You should also add a test to exercise correctness in test/stdlib/UnsafeRawBufferPointer.swift.

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb

glessard · 2023-01-21T00:39:41Z

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb

+      let word: UInt = UnsafePointer<UInt>(
+          bitPattern: address &+ i
+        )._unsafelyUnwrappedUnchecked.pointee


Would ptr.load(fromByteOffset: i, as: UInt.self) generate worse code?

Or, if we must resort to perhaps-undefined behaviour:
let word = ptr.advanced(by: i).assumingMemoryBound(to: UInt.self).pointee

For the same short instruction count, but helping the compiler understand what's going on:

let word = ptr.advanced(by: i).withMemoryRebound( to: UInt.self, capacity: 1, { $0.pointee } )

(80 columns is very narrow.)
Also, UInt isn't 64 bits on every platform. It should be UInt64.

All three fixes you propose generate the same code of two move statements on ARM. I personally find variant ptr.load(fromByteOffset: i, as: UInt.self) the easiest to understand. I don't get why last variant you proposed is helping compiler understand what's going on and why should we care about this.

About UInt64. Why not UInt128 then? I expect UInt be the fastest on all platforms (32 or 64). Or probably, the opposite - UInt64 might be slower then UInt on 32-bit platform. I don't have 32 bit platform to benchmark this. How should we proceed with this? Should I make separate PR with UInt64 variant and after benchmarking we could make our choice?

We don't have UInt128 codified, so it's not an option. However, UInt64 does exist on any 32-bit platform of interest. Since this is a lot about reducing the number of loop iterations, then going to UInt64 seems like a reasonable option.

Regarding ptr.load(fromByteOffset: i, as: UInt.self), I happened to be looking at x86_64, and the load option generates two extra instructions when compared to the straight pointer dereference. 🤷🏽 The last variant I suggested is better than your original one because bouncing between Int and pointer representations confuses the compiler. Did you try using loadUnaligned? If that generates compact code then we can eliminate the first loop that brings the pointer into alignment.

Did you try using loadUnaligned? If that generates compact code then we can eliminate the first loop that brings the pointer into alignment.

Surprisingly, loadUnaligned is the same one move statement (checked on X86-64). But if buffer is shorter than word width, it will be segfault (will be?). So we have to branch here depending on buffer length. Does it worth it?

It's certainly worth skipping when the buffer is too narrow: we clearly want to avoid the out-of-bounds memory access. e.g.

if count < stride { // loop setup repeat { // load word and compare i &+= stride } while i < endOfStride } // final loop

Something to look for is that there may be a minimum number of strides for which it's worth paying the setup cost of doing a vectorized loop. I would hope that number to be 1, but I wouldn't be surprised by a different result.

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb

glessard · 2023-01-24T23:33:47Z

Created new issue to describe this work: #63200

glessard · 2023-01-27T01:12:11Z

I pushed a shim we could use for memchr. It should probably be a separate PR so that we can decide whether to import memchr, invoke llvm intrinsics directly, or not get memchr involved at all.

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb

glessard · 2023-01-31T05:47:35Z

After discussions with teammates, we believe that using memchr for the forward variant would be the right approach. Calling the libc version through a shim should get translated by llvm to the intrinsic.

valeriyvan · 2023-01-31T07:07:23Z

After discussions with teammates, we believe that using memchr for the forward variant would be the right approach. Calling the libc version through a shim should get translated by llvm to the intrinsic.

How should we proceed to see difference between current stdlib implementation, current state of this PR and memchr-based implementation? I am highly interested to see how exactly memchr is better of other implementations.

I am ready to push memchr-based implementation.

glessard · 2023-01-31T18:09:24Z

@valeriyvan I'll trigger a benchmark run with the current state, and we can go from there. Thanks for your patience!

…f _customIndexOfEquatableElement

…ementation

…ferPointer

…nter

glessard · 2023-02-01T17:26:43Z

@swift-ci please benchmark

glessard · 2023-02-01T23:40:51Z

Preliminary benchmarking results, keeping only the new benchmarks. (We can worry about other benchmarks later.)

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findFirst	43.545	141.545	+225.0%	0.31x
RawBuffer.15.findLast	56.6	183.0	+223.3%	0.31x
RawBuffer.15.findFirst	76.227	173.154	+127.2%	0.44x
RawBuffer.7.findLast	34.833	74.05	+112.6%	0.47x

Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	4465.0	692.667	-84.5%	6.45x
RawBuffer.1000.findLast	2813.0	669.667	-76.2%	4.20x

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3174	3238	+2.0%	0.98x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findFirst	52.25	141.545	+170.9%	0.37x
RawBuffer.15.findLast	69.714	184.556	+164.7%	0.38x
RawBuffer.7.findLast	34.826	78.4	+125.1%	0.44x
RawBuffer.15.findFirst	87.111	169.889	+95.0%	0.51x

Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findLast	4423.0	670.333	-84.8%	6.60x
RawBuffer.1000.findFirst	4440.0	695.0	-84.3%	6.39x

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	2801	2839	+1.4%	0.99x

Performance (x86_64): -Onone

Improvement | OLD | NEW | DELTA | RATIO
RawBuffer.1000.findLast | 327069.0 | 1552.0 | -99.5% | 210.74x
RawBuffer.1000.findFirst | 315362.0 | 1623.0 | -99.5% | 194.31x
RawBuffer.15.findLast | 6245.0 | 1048.5 | -83.2% | 5.96x
RawBuffer.15.findFirst | 5986.0 | 1082.0 | -81.9% | 5.53x
RawBuffer.7.findLast | 3655.0 | 915.0 | -75.0% | 3.99x
RawBuffer.7.findFirst | 3499.0 | 1057.0 | -69.8% | 3.31x

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 32 GB

valeriyvan · 2023-03-03T06:58:15Z

@xwu, could you please re-run benchmark?

eeckstein · 2023-03-03T07:10:59Z

@swift-ci benchmark

valeriyvan · 2023-03-03T09:08:36Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findLast	34.833	61.0	+75.1%	0.57x (?)
RawBuffer.39.findLast	130.667	207.0	+58.4%	0.63x
RawBuffer.7.findFirst	41.37	47.889	+15.8%	0.86x (?)
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	2334.0	90.417	-96.1%	25.81x
RawBuffer.1000.findLast	2813.0	806.0	-71.3%	3.49x
RawBuffer.39.findFirst	127.417	47.905	-62.4%	2.66x
RawBuffer.128.findLast	370.25	226.5	-38.8%	1.63x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findLast	43.545	65.333	+50.0%	0.67x
RawBuffer.39.findLast	183.0	211.778	+15.7%	0.86x (?)
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	3344.0	91.292	-97.3%	36.63x
RawBuffer.1000.findLast	4440.0	810.5	-81.7%	5.48x
RawBuffer.39.findFirst	150.308	47.895	-68.1%	3.14x
RawBuffer.128.findLast	566.333	228.714	-59.6%	2.48x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	327437.0	1251.0	-99.6%	261.74x
RawBuffer.1000.findLast	336824.0	1889.0	-99.4%	178.31x
RawBuffer.128.findLast	44275.0	1318.0	-97.0%	33.59x
RawBuffer.39.findFirst	14292.0	1222.0	-91.4%	11.70x
RawBuffer.39.findLast	14768.0	1294.0	-91.2%	11.41x
RawBuffer.7.findLast	3957.0	1165.0	-70.6%	3.40x
RawBuffer.7.findFirst	3857.0	1222.0	-68.3%	3.16x

… bytes

valeriyvan · 2023-03-03T09:25:16Z

@eeckstein could you please re-run benchmark once again?

eeckstein · 2023-03-03T09:53:12Z

@swift-ci benchmark

valeriyvan · 2023-03-03T10:57:42Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findLast	34.833	67.517	+93.8%	0.52x
RawBuffer.39.findLast	130.667	176.4	+35.0%	0.74x
RawBuffer.7.findFirst	41.37	47.905	+15.8%	0.86x (?)
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	2334.0	91.294	-96.1%	25.57x
RawBuffer.1000.findLast	2813.0	541.0	-80.8%	5.20x
RawBuffer.39.findFirst	127.417	47.909	-62.4%	2.66x
RawBuffer.128.findLast	370.333	196.0	-47.1%	1.89x

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3936	4000	+1.6%	0.98x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findLast	43.545	71.889	+65.1%	0.61x
RawBuffer.7.findFirst	39.182	47.905	+22.3%	0.82x (?)
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	3353.0	89.3	-97.3%	37.55x
RawBuffer.1000.findLast	4440.0	543.5	-87.8%	8.17x
RawBuffer.39.findFirst	156.357	47.905	-69.4%	3.26x
RawBuffer.128.findLast	566.333	200.4	-64.6%	2.83x

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3415	3453	+1.1%	0.99x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	326025.0	1251.0	-99.6%	260.61x
RawBuffer.1000.findLast	338036.0	1709.0	-99.5%	197.80x
RawBuffer.128.findLast	44266.0	1303.0	-97.1%	33.97x
RawBuffer.39.findFirst	14305.0	1225.0	-91.4%	11.68x
RawBuffer.7.findFirst	3890.0	1227.0	-68.5%	3.17x

valeriyvan · 2023-03-03T16:52:02Z

@eeckstein, may I ask re-run benchmark once again?

eeckstein · 2023-03-03T17:20:28Z

@swift-ci benchmark

valeriyvan · 2023-03-03T19:24:13Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findLast	34.833	63.171	+81.4%	0.55x
RawBuffer.39.findLast	130.667	183.0	+40.1%	0.71x
RawBuffer.7.findFirst	41.37	54.429	+31.6%	0.76x (?)
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	2334.0	95.85	-95.9%	24.35x
RawBuffer.1000.findLast	2813.0	549.25	-80.5%	5.12x
RawBuffer.39.findFirst	127.444	54.44	-57.3%	2.34x
RawBuffer.128.findLast	370.25	194.1	-47.6%	1.91x

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3936	4000	+1.6%	0.98x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
RawBuffer.7.findLast	43.552	67.524	+55.0%	0.64x
RawBuffer.7.findFirst	39.192	52.28	+33.4%	0.75x (?)
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	3353.0	95.875	-97.1%	34.97x
RawBuffer.1000.findLast	4440.0	547.75	-87.7%	8.11x
RawBuffer.39.findFirst	153.667	52.273	-66.0%	2.94x
RawBuffer.128.findLast	566.5	194.4	-65.7%	2.91x

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3415	3453	+1.1%	0.99x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	328680.0	1261.0	-99.6%	260.65x
RawBuffer.1000.findLast	340058.0	1711.0	-99.5%	198.75x
RawBuffer.128.findLast	45031.0	1314.0	-97.1%	34.27x
RawBuffer.39.findFirst	14504.0	1226.0	-91.5%	11.83x
RawBuffer.39.findLast	15017.0	1350.0	-91.0%	11.12x
RawBuffer.7.findLast	4016.0	1217.0	-69.7%	3.30x
RawBuffer.7.findFirst	3924.0	1284.0	-67.3%	3.06x

valeriyvan · 2023-03-04T05:23:52Z

Is this acceptable price for improvements?

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
RawBuffer.7.findLast 34.833 63.171 +81.4% 0.55x
RawBuffer.39.findLast 130.667 183.0 +40.1% 0.71x
RawBuffer.7.findFirst 41.37 54.429 +31.6% 0.76x (?)

xwu · 2023-03-04T23:08:04Z

All of these APIs need availability annotations (or @_alwaysEmitIntoClient) and then the benchmarks need to be re-run.

I for one would be curious how that, as well as adopting @glessard’s suggestion to try loadUnaligned and eliminate one of the loops, will affect performance.

valeriyvan · 2023-03-05T10:31:11Z

All of these APIs need availability annotations (or @_alwaysEmitIntoClient) and then the benchmarks need to be re-run.

I don't get why these need availability annotations. Could you please explain.
Done with annotating with @_alwaysEmitIntoClient.

valeriyvan · 2023-03-05T10:32:40Z

I for one would be curious how that, as well as adopting @glessard’s suggestion to try loadUnaligned and eliminate one of the loops, will affect performance.

Will try tinker with loadUnaligned a bit later today or in a couple of days.

glessard · 2023-03-06T18:47:06Z

We need availability annotations because these new functions will not be inlined (and should not be), and will become part of the libswiftCore distributed with Apple operating systems. We should not use @_alwaysEmitIntoClient for large functions; this primarily means and URBP._customIndexOfEquatableElement and URBP._customLastIndexOfEquatableElement. Using it for thin wrappers is okay, but we must be sure that they will not need to change.

glessard · 2023-03-06T18:47:17Z

@swift-ci please benchmark

valeriyvan · 2023-03-07T06:44:26Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	3342.0	78.409	-97.7%	42.62x
RawBuffer.1000.findLast	3320.0	404.4	-87.8%	8.21x
RawBuffer.128.findLast	470.5	93.647	-80.1%	5.02x
RawBuffer.39.findFirst	139.385	34.833	-75.0%	4.00x
RawBuffer.39.findLast	135.071	91.467	-32.3%	1.48x (?)
RawBuffer.7.findLast	30.486	26.125	-14.3%	1.17x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3920	4304	+9.8%	0.91x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	4436.0	69.696	-98.4%	63.65x
RawBuffer.1000.findLast	4440.0	453.0	-89.8%	9.80x
RawBuffer.39.findFirst	178.6	28.303	-84.2%	6.31x
RawBuffer.128.findLast	566.333	150.583	-73.4%	3.76x
RawBuffer.7.findFirst	43.545	28.303	-35.0%	1.54x (?)
RawBuffer.7.findLast	43.545	28.313	-35.0%	1.54x (?)
RawBuffer.39.findLast	183.0	143.75	-21.4%	1.27x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
BufferFind.o	3386	3793	+12.0%	0.89x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
Improvement	OLD	NEW	DELTA	RATIO
RawBuffer.1000.findFirst	323538.0	1256.0	-99.6%	257.59x
RawBuffer.1000.findLast	335602.0	1697.0	-99.5%	197.76x
RawBuffer.128.findLast	44145.0	1297.0	-97.1%	34.04x
RawBuffer.39.findLast	14708.0	1277.0	-91.3%	11.52x
RawBuffer.39.findFirst	14081.0	1223.0	-91.3%	11.51x
RawBuffer.7.findLast	3920.0	1149.0	-70.7%	3.41x
RawBuffer.7.findFirst	3760.0	1225.0	-67.4%	3.07x

…eElement_SIMD-friendly

valeriyvan · 2025-06-07T13:03:40Z

We need availability annotations because these new functions will not be inlined (and should not be), and will become part of the libswiftCore distributed with Apple operating systems. We should not use @_alwaysEmitIntoClient for large functions; this primarily means and URBP._customIndexOfEquatableElement and URBP._customLastIndexOfEquatableElement. Using it for thin wrappers is okay, but we must be sure that they will not need to change.

Removed @_alwaysEmitIntoClient from URBP._customIndexOfEquatableElement and URBP._customLastIndexOfEquatableElement.

Added @available(SwiftStdlib 6.2, *).

valeriyvan changed the title ~~Add to Unsafe[Mutable]RawBufferPointer implementation of _customIndexOfEquatableElement~~ [stdlib] Add to Unsafe[Mutable]RawBufferPointer implementation of _customIndexOfEquatableElement Jan 20, 2023

Azoy requested a review from glessard January 20, 2023 17:56

glessard requested changes Jan 21, 2023

View reviewed changes

glessard reviewed Jan 21, 2023

View reviewed changes

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb Show resolved Hide resolved

glessard reviewed Jan 21, 2023

View reviewed changes

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb Outdated Show resolved Hide resolved

glessard reviewed Jan 21, 2023

View reviewed changes

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb Outdated Show resolved Hide resolved

glessard reviewed Jan 21, 2023

View reviewed changes

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb Outdated Show resolved Hide resolved

valeriyvan force-pushed the UnsafeRawBufferPointer_customIndexOfEquatableElement_SIMD-friendly branch from 0798b92 to b4e96b6 Compare January 23, 2023 20:47

valeriyvan requested a review from glessard January 23, 2023 20:54

valeriyvan changed the title ~~[stdlib] Add to Unsafe[Mutable]RawBufferPointer implementation of _customIndexOfEquatableElement~~ [stdlib] Add to Unsafe[Mutable]RawBufferPointer implementation of _custom[Last]IndexOfEquatableElement Jan 24, 2023

glessard linked an issue Jan 25, 2023 that may be closed by this pull request

Implement _customIndexOfEquatableElement for UnsafeRawBufferPointer #63200

Open

glessard reviewed Jan 30, 2023

View reviewed changes

stdlib/public/core/UnsafeRawBufferPointer.swift.gyb Outdated Show resolved Hide resolved

valeriyvan force-pushed the UnsafeRawBufferPointer_customIndexOfEquatableElement_SIMD-friendly branch from 2833e74 to 0856dca Compare January 31, 2023 09:28

valeriyvan and others added 8 commits February 1, 2023 17:08

Add to Unsafe[Mutable]RawBufferPointer SIMD friendly implementation o…

18cf32b

…f _customIndexOfEquatableElement

Add test for _customIndexOfEquatableElement in UnsafeRawBufferPointer

64963eb

Move _customIndexOfEquatableElement into separate extention, fix impl…

f33b76c

…ementation

Implement _customLastIndexOfEquatableElement in Unsafe[Mutable]RawBuf…

e9051b0

…ferPointer

Add test for _customLastIndexOfEquatableElement in UnsafeRawBufferPoi…

54ca71a

…nter

[stdlib] add shim for memchr

4f2e4df

Fix _customIndexOfEquatableElement

0440c6f

Fix _customLastIndexOfEquatableElement

e0c4f5e

valeriyvan force-pushed the UnsafeRawBufferPointer_customIndexOfEquatableElement_SIMD-friendly branch from 0856dca to e0c4f5e Compare February 1, 2023 15:08

valeriyvan force-pushed the UnsafeRawBufferPointer_customIndexOfEquatableElement_SIMD-friendly branch from 5fea87d to f8f5aa9 Compare February 26, 2023 13:10

valeriyvan mentioned this pull request Mar 2, 2023

[stdlib] Implement _custom[Last]IndexOfEquatableElement for [UInt8] and [Int8] #64026

Open

Short circit _customLastIndexOfEquatableElement for buffer shorter 64…

eaeb939

… bytes

Use saturation addition

8bb93f6

valeriyvan added 3 commits March 3, 2023 18:47

Revert code bypassing looping by words for buffer longer than 32 bytes

5fda146

Use overflow add where overflow is impossible

a35433f

Code formatting

872e330

Annotate APIs with @_alwaysEmitIntoClient

ebf8b10

Merge branch 'main' into UnsafeRawBufferPointer_customIndexOfEquatabl…

738594a

…eElement_SIMD-friendly

valeriyvan requested a review from a team as a code owner June 7, 2025 10:22

Add availability annotations, remove _alwaysEmitIntoClient

5aa873b

valeriyvan force-pushed the UnsafeRawBufferPointer_customIndexOfEquatableElement_SIMD-friendly branch from 8c8e234 to 5aa873b Compare June 7, 2025 13:03

valeriyvan requested a review from glessard June 7, 2025 13:05

[stdlib] Add to Unsafe[Mutable]RawBufferPointer implementation of _custom[Last]IndexOfEquatableElement #63128

Are you sure you want to change the base?

[stdlib] Add to Unsafe[Mutable]RawBufferPointer implementation of _custom[Last]IndexOfEquatableElement #63128

Uh oh!

Conversation

valeriyvan commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glessard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glessard Jan 21, 2023

Choose a reason for hiding this comment

Uh oh!

glessard Jan 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glessard Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

valeriyvan Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glessard Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

glessard Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

valeriyvan Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glessard Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glessard Jan 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glessard commented Jan 24, 2023

Uh oh!

glessard commented Jan 27, 2023

Uh oh!

Uh oh!

glessard commented Jan 31, 2023

Uh oh!

valeriyvan commented Jan 31, 2023

Uh oh!

glessard commented Jan 31, 2023

Uh oh!

glessard commented Feb 1, 2023

Uh oh!

glessard commented Feb 1, 2023

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

valeriyvan commented Mar 3, 2023

Uh oh!

eeckstein commented Mar 3, 2023

Uh oh!

valeriyvan commented Mar 3, 2023

Performance (x86_64): -O

Performance (x86_64): -Osize

Performance (x86_64): -Onone

Uh oh!

valeriyvan commented Mar 3, 2023

Uh oh!

eeckstein commented Mar 3, 2023

Uh oh!

valeriyvan commented Jan 20, 2023 •

edited

Loading

glessard Jan 21, 2023 •

edited

Loading

valeriyvan Jan 23, 2023 •

edited

Loading

valeriyvan Jan 24, 2023 •

edited

Loading

glessard Jan 26, 2023 •

edited

Loading

valeriyvan commented Mar 4, 2023 •

edited

Loading

xwu commented Mar 4, 2023 •

edited

Loading

valeriyvan commented Jun 7, 2025 •

edited

Loading