Description
Name and Version
$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5600 (d17a809)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B-Instruct-GGUF/snapshots/293ca9a10157b0e5fc5cb32af8b636a88bede891/qwen2.5-7b-instruct-q5_k_m-00001-of-00002.gguf -c 4096 --n-gpu-layers 33 --jinja
Problem description & steps to reproduce
Description:
When running the server, I expect multiple tool calls within the same response to be parsed correctly. Currently, the first tool call is parsed without issues, but the second tool call is not parsed properly. Instead, the unparsed second tool call is returned within the message content itself.
Expected Behavior:
All tool calls within the same response should be parsed and handled individually and correctly.
Actual Behavior:
Only the first tool call is parsed correctly. Subsequent tool calls are returned unparsed within the message content.
curl http://localhost:8080/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"parallel_tool_calls": true,
"messages": [
{"role": "system", "content": "You are a chatbot that uses tools/functions. Dont overthink things."},
{"role": "user", "content": "What is the weather in Istanbul?"},
{"role": "user", "content": "How many inhabitants does Istanbul have?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and country/state, e.g. `San Francisco, CA`, or `Paris, France`"
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "get_population",
"description": "Get the population of a given city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city, e.g. `Istanbul`, `New York`"
}
},
"required": ["city"]
}
}
}
]
}'
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"<tool_call>\n{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"Istanbul, Turkey\"}}\n</tool_call>","tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"O8dX7XA6BgsC3TZV7YE1nLXyroFNyVR3"}]}}],"created":1749553457,"model":"gpt-3.5-turbo","system_fingerprint":"b5621-2bb04670","object":"chat.completion","usage":{"completion_tokens":45,"prompt_tokens":292,"total_tokens":337},"id":"chatcmpl-Y7CIK8Xswaz74EVV24v9LP2T85GMnMPG","timings":{"prompt_n":292,"prompt_ms":175.339,"prompt_per_token_ms":0.6004760273972602,"prompt_per_second":1665.3454165929998,"predicted_n":45,"predicted_ms":841.967,"predicted_per_token_ms":18.710377777777776,"predicted_per_second":53.44627521031109}}
First Bad Commit
This problem was introduced after b5478.
It seems that some common_chat_parse methods removed the while loop to apply the regex.
Relevant log output
Before b5478:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"qrf5TqOAHHkbNb2Pdcg8jBmQDP4rd8Gk"},{"type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"Istanbul\"}"},"id":"1emAaXjPpDuA3tbhkWsOgS5Tc15tiZbs"}]}}],"created":1749554070,"model":"gpt-3.5-turbo","system_fingerprint":"b5437-b7a17463","object":"chat.completion","usage":{"completion_tokens":43,"prompt_tokens":292,"total_tokens":335},"id":"chatcmpl-Ci0fwu2tok98UAxCLC0JqGv1C3xmf5BI","timings":{"prompt_n":292,"prompt_ms":184.796,"prompt_per_token_ms":0.6328630136986301,"prompt_per_second":1580.1207818351047,"predicted_n":43,"predicted_ms":806.196,"predicted_per_token_ms":18.748744186046512,"predicted_per_second":53.336905665619774}}
After b5478:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"<tool_call>\n{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"Istanbul, Turkey\"}}\n</tool_call>","tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"O8dX7XA6BgsC3TZV7YE1nLXyroFNyVR3"}]}}],"created":1749553457,"model":"gpt-3.5-turbo","system_fingerprint":"b5621-2bb04670","object":"chat.completion","usage":{"completion_tokens":45,"prompt_tokens":292,"total_tokens":337},"id":"chatcmpl-Y7CIK8Xswaz74EVV24v9LP2T85GMnMPG","timings":{"prompt_n":292,"prompt_ms":175.339,"prompt_per_token_ms":0.6004760273972602,"prompt_per_second":1665.3454165929998,"predicted_n":45,"predicted_ms":841.967,"predicted_per_token_ms":18.710377777777776,"predicted_per_second":53.44627521031109}}