Skip to content

[CANN] CANN with glm-4.6-q4_0 #16586

@zzc98

Description

@zzc98

I use a llama.cpp-b6765 to run GLM- 4.6-q4_0 (from https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF) on CANN platform, found running very slow(npu-smi info shows AIcore is 0), and tools call fails.

The compilation command is as follows:

cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=release -DUSE_ACL_GRAPH=ON
cmake --build build --config release -j 32

The startup command is as follows:

source /usr/local/Ascend/ascend-toolkit/set_env.sh
export GGML_CANN_ACL_GRAPH=0
build/bin/llama-server \
--model /mnt/1/model/GLM-4.6-GGUF/Q4_0/zai-org_GLM-4.6-Q4_0/ zai-org_glM-4.6-q4_0-00001-of 00006.gguf \
--host 0.0.0.0 \
--port 1025 \
--ctx-size 204800 \
--parallel 1 \
--no-context-shift \
--gpu-layers -1 \
--alias glm-4.6 \
--jinja \
--no-webui \
--metrics

If 'export GGML_CANN_ACL_GRAPH=1' is set, an error will be reported

CANN error: EE9999: Inner Error!
EE9999: [PID:  [881460] 2025-10-15-14:43:16.351.702 Not allow to synchronize captured stream stream_id=2.[FUNC:StreamSynchronize][FILE:api_error.cc][LINE:884]
TraceBack (most recent call last):
rtStreamSynchronize execute failed,  reason=[stream is captured][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 107027[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
current device:  0, in function ggml_cann_mul_mat_id_quant at /mnt/0/zzc/llama.cpp-b6765/ggml/src/ggml-cann/aclnn_ops.cpp:3016
aclrtSynchronizeStream(ctx.stream())

An error occurred when the tool was called

{"error":{"code":500,"message":"Unknown argument ensure_ascii for function tojson at row 11,  column 37:\n{% for tool in tools %}\n{{ tool | tojson(ensure_ascii=False) }}\n                                    ^\n{%  endfor %}\n at row 11,
column 1:\n{% for tool in tools %}\n{{ tool | tojson(ensure_ascii=False) }}\n^\n{% endfor %}\n at row 10,  column 24:\n<tools>\n{% for tool in tools %}\n                       ^\n{{ tool | tojson(ensure_ascii=False) }}\n at row  10, colum
n 1:\n<tools>\n{% for tool in tools %}\n^\n{{ tool | tojson(ensure_ascii=False) }}\n at row 2,  column 17:\n[gMASK]<sop>\n{%- if tools -%}\n                ^\n<|system|>\n at row 2,  column 1:\n[gMASK]<sop>\n{%- if tools -%}\n^\n<|system|
>\n at row 1, column 1:\n[gMASK]<sop>\n^\n{%- if tools -%}\n","type":"server_error"}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions