Skip to content

GPU方式运行模型服务出错 #42

@a652

Description

@a652

按照README运行命令:
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080

报错信息如下:
ggml_metal_init: GPU name: Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: compute buffer total size = 558.13 MB
llama_new_context_with_model: max tensor size = 224.77 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 4096.00 MB, offs = 0
ggml_metal_add_buffer: allocated 'data ' buffer, size = 486.91 MB, offs = 4059267072, ( 4583.53 / 5461.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1346.00 MB, ( 5929.53 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 552.02 MB, ( 6481.55 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1459: false
Abort trap: 6

电脑信息:
M1 MacBook Pro
MacOS Sonoma 14.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions