Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon

Name and Version

version: 5298 (141a908)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin24.3.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

./llama-cli -m Qwen3-235B-A22B.i1-IQ3_M.gguf --no-mmap -fa -c 8192
./llama-cli -m Qwen3-235B-A22B.i1-IQ3_M.gguf --no-mmap -fa -c 16384

Problem description & steps to reproduce

Download https://huggingface.co/mradermacher/Qwen3-235B-A22B-i1-GGUF (IQ3_M)
sudo nano /etc/sysctl.conf
iogpu.wired_limit_mb=122880
Restart computer
Run ./llama-cli -m Qwen3-235B-A22B.i1-IQ3_M.gguf --no-mmap -fa -c 8192 or 16384
Observe slow loading times and unnecessary swapping in Memory section of Activity Monitor

Videos showing extended swap/unswap times:
8k ctx: https://github.com/user-attachments/assets/7cde30ce-3770-4582-85cf-2c4382f527dc
16k ctx: https://github.com/user-attachments/assets/b97a88c5-36f5-4477-8525-fd78050eadc2
(where memory pressure drops at the end is when the model finishes loading)

8-16k ctx should only use 100,127-101,671 MiB memory, as per calculations below. An M3 Max with 131,072 MiB memory and 122,880 MiB VRAM limit should be able to handle it without the long swap/unswap process.

8k ctx

Component	GPU (MiB)	CPU (MiB)	Total (MiB)
Model buffer	98,030.93	255.02	98,285.95
KV‐cache buffer	1,504.00	0.00	1,504.00
Compute buffer	312.75	24.01	336.76
Output buffer	0.00	0.58	0.58
Grand total	99,847.68	279.61	100,127.29

16k ctx

Component	GPU (MiB)	CPU (MiB)	Total (MiB)
Model buffer	98,030.93	255.02	98,285.95
KV‐cache buffer	3,008.00	0.00	3,008.00
Compute buffer	336.75	40.01	376.76
Output buffer	0.00	0.58	0.58
Grand total	101,375.68	295.61	101,671.29

First Bad Commit

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon #13361

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon #13361

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions