Skip to content

Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon #13361

Closed
@Azirine

Description

@Azirine

Name and Version

version: 5298 (141a908)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin24.3.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

./llama-cli -m Qwen3-235B-A22B.i1-IQ3_M.gguf --no-mmap -fa -c 8192
./llama-cli -m Qwen3-235B-A22B.i1-IQ3_M.gguf --no-mmap -fa -c 16384

Problem description & steps to reproduce

  1. Download https://huggingface.co/mradermacher/Qwen3-235B-A22B-i1-GGUF (IQ3_M)
  2. sudo nano /etc/sysctl.conf
    iogpu.wired_limit_mb=122880
  3. Restart computer
  4. Run ./llama-cli -m Qwen3-235B-A22B.i1-IQ3_M.gguf --no-mmap -fa -c 8192 or 16384
  5. Observe slow loading times and unnecessary swapping in Memory section of Activity Monitor

Videos showing extended swap/unswap times:
8k ctx: https://github.com/user-attachments/assets/7cde30ce-3770-4582-85cf-2c4382f527dc
16k ctx: https://github.com/user-attachments/assets/b97a88c5-36f5-4477-8525-fd78050eadc2
(where memory pressure drops at the end is when the model finishes loading)

8-16k ctx should only use 100,127-101,671 MiB memory, as per calculations below. An M3 Max with 131,072 MiB memory and 122,880 MiB VRAM limit should be able to handle it without the long swap/unswap process.

8k ctx

Component GPU (MiB) CPU (MiB) Total (MiB)
Model buffer 98,030.93 255.02 98,285.95
KV‐cache buffer 1,504.00 0.00 1,504.00
Compute buffer 312.75 24.01 336.76
Output buffer 0.00 0.58 0.58
Grand total 99,847.68 279.61 100,127.29

16k ctx

Component GPU (MiB) CPU (MiB) Total (MiB)
Model buffer 98,030.93 255.02 98,285.95
KV‐cache buffer 3,008.00 0.00 3,008.00
Compute buffer 336.75 40.01 376.76
Output buffer 0.00 0.58 0.58
Grand total 101,375.68 295.61 101,671.29

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions