GPU方式运行模型服务出错

按照README运行命令：
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080

报错信息如下：
ggml_metal_init: GPU name: Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: compute buffer total size = 558.13 MB
llama_new_context_with_model: max tensor size = 224.77 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 4096.00 MB, offs = 0
ggml_metal_add_buffer: allocated 'data ' buffer, size = 486.91 MB, offs = 4059267072, ( 4583.53 / 5461.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1346.00 MB, ( 5929.53 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 552.02 MB, ( 6481.55 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1459: false
Abort trap: 6

电脑信息：
M1 MacBook Pro
MacOS Sonoma 14.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU方式运行模型服务出错 #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU方式运行模型服务出错 #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions