Misc. bug: Server/Chat parallel tool calling not working

Name and Version

$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5600 (d17a809)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B-Instruct-GGUF/snapshots/293ca9a10157b0e5fc5cb32af8b636a88bede891/qwen2.5-7b-instruct-q5_k_m-00001-of-00002.gguf -c 4096 --n-gpu-layers 33 --jinja

Problem description & steps to reproduce

Description:
When running the server, I expect multiple tool calls within the same response to be parsed correctly. Currently, the first tool call is parsed without issues, but the second tool call is not parsed properly. Instead, the unparsed second tool call is returned within the message content itself.

Expected Behavior:
All tool calls within the same response should be parsed and handled individually and correctly.

Actual Behavior:
Only the first tool call is parsed correctly. Subsequent tool calls are returned unparsed within the message content.

curl http://localhost:8080/v1/chat/completions -d '{
  "model": "gpt-3.5-turbo",
  "parallel_tool_calls": true,
  "messages": [
    {"role": "system", "content": "You are a chatbot that uses tools/functions. Dont overthink things."},
    {"role": "user", "content": "What is the weather in Istanbul?"},
    {"role": "user", "content": "How many inhabitants does Istanbul have?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and country/state, e.g. `San Francisco, CA`, or `Paris, France`"
            }
          },
          "required": ["location"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "get_population",
        "description": "Get the population of a given city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "The name of the city, e.g. `Istanbul`, `New York`"
            }
          },
          "required": ["city"]
        }
      }
    }
  ]
}'

{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"<tool_call>\n{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"Istanbul, Turkey\"}}\n</tool_call>","tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"O8dX7XA6BgsC3TZV7YE1nLXyroFNyVR3"}]}}],"created":1749553457,"model":"gpt-3.5-turbo","system_fingerprint":"b5621-2bb04670","object":"chat.completion","usage":{"completion_tokens":45,"prompt_tokens":292,"total_tokens":337},"id":"chatcmpl-Y7CIK8Xswaz74EVV24v9LP2T85GMnMPG","timings":{"prompt_n":292,"prompt_ms":175.339,"prompt_per_token_ms":0.6004760273972602,"prompt_per_second":1665.3454165929998,"predicted_n":45,"predicted_ms":841.967,"predicted_per_token_ms":18.710377777777776,"predicted_per_second":53.44627521031109}}

First Bad Commit

This problem was introduced after b5478.
It seems that some common_chat_parse methods removed the while loop to apply the regex.

Relevant log output

Before b5478:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"qrf5TqOAHHkbNb2Pdcg8jBmQDP4rd8Gk"},{"type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"Istanbul\"}"},"id":"1emAaXjPpDuA3tbhkWsOgS5Tc15tiZbs"}]}}],"created":1749554070,"model":"gpt-3.5-turbo","system_fingerprint":"b5437-b7a17463","object":"chat.completion","usage":{"completion_tokens":43,"prompt_tokens":292,"total_tokens":335},"id":"chatcmpl-Ci0fwu2tok98UAxCLC0JqGv1C3xmf5BI","timings":{"prompt_n":292,"prompt_ms":184.796,"prompt_per_token_ms":0.6328630136986301,"prompt_per_second":1580.1207818351047,"predicted_n":43,"predicted_ms":806.196,"predicted_per_token_ms":18.748744186046512,"predicted_per_second":53.336905665619774}}

After b5478:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"<tool_call>\n{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"Istanbul, Turkey\"}}\n</tool_call>","tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"O8dX7XA6BgsC3TZV7YE1nLXyroFNyVR3"}]}}],"created":1749553457,"model":"gpt-3.5-turbo","system_fingerprint":"b5621-2bb04670","object":"chat.completion","usage":{"completion_tokens":45,"prompt_tokens":292,"total_tokens":337},"id":"chatcmpl-Y7CIK8Xswaz74EVV24v9LP2T85GMnMPG","timings":{"prompt_n":292,"prompt_ms":175.339,"prompt_per_token_ms":0.6004760273972602,"prompt_per_second":1665.3454165929998,"predicted_n":45,"predicted_ms":841.967,"predicted_per_token_ms":18.710377777777776,"predicted_per_second":53.44627521031109}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Server/Chat parallel tool calling not working #14101

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Server/Chat parallel tool calling not working #14101

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions