Skip to content

Misc. bug: Server/Chat parallel tool calling not working #14101

Open
@agonzc34

Description

@agonzc34

Name and Version

$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5600 (d17a809)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B-Instruct-GGUF/snapshots/293ca9a10157b0e5fc5cb32af8b636a88bede891/qwen2.5-7b-instruct-q5_k_m-00001-of-00002.gguf -c 4096 --n-gpu-layers 33 --jinja

Problem description & steps to reproduce

Description:
When running the server, I expect multiple tool calls within the same response to be parsed correctly. Currently, the first tool call is parsed without issues, but the second tool call is not parsed properly. Instead, the unparsed second tool call is returned within the message content itself.

Expected Behavior:
All tool calls within the same response should be parsed and handled individually and correctly.

Actual Behavior:
Only the first tool call is parsed correctly. Subsequent tool calls are returned unparsed within the message content.

curl http://localhost:8080/v1/chat/completions -d '{
  "model": "gpt-3.5-turbo",
  "parallel_tool_calls": true,
  "messages": [
    {"role": "system", "content": "You are a chatbot that uses tools/functions. Dont overthink things."},
    {"role": "user", "content": "What is the weather in Istanbul?"},
    {"role": "user", "content": "How many inhabitants does Istanbul have?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and country/state, e.g. `San Francisco, CA`, or `Paris, France`"
            }
          },
          "required": ["location"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "get_population",
        "description": "Get the population of a given city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "The name of the city, e.g. `Istanbul`, `New York`"
            }
          },
          "required": ["city"]
        }
      }
    }
  ]
}'
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"<tool_call>\n{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"Istanbul, Turkey\"}}\n</tool_call>","tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"O8dX7XA6BgsC3TZV7YE1nLXyroFNyVR3"}]}}],"created":1749553457,"model":"gpt-3.5-turbo","system_fingerprint":"b5621-2bb04670","object":"chat.completion","usage":{"completion_tokens":45,"prompt_tokens":292,"total_tokens":337},"id":"chatcmpl-Y7CIK8Xswaz74EVV24v9LP2T85GMnMPG","timings":{"prompt_n":292,"prompt_ms":175.339,"prompt_per_token_ms":0.6004760273972602,"prompt_per_second":1665.3454165929998,"predicted_n":45,"predicted_ms":841.967,"predicted_per_token_ms":18.710377777777776,"predicted_per_second":53.44627521031109}}

First Bad Commit

This problem was introduced after b5478.
It seems that some common_chat_parse methods removed the while loop to apply the regex.

Relevant log output

Before b5478:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"qrf5TqOAHHkbNb2Pdcg8jBmQDP4rd8Gk"},{"type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"Istanbul\"}"},"id":"1emAaXjPpDuA3tbhkWsOgS5Tc15tiZbs"}]}}],"created":1749554070,"model":"gpt-3.5-turbo","system_fingerprint":"b5437-b7a17463","object":"chat.completion","usage":{"completion_tokens":43,"prompt_tokens":292,"total_tokens":335},"id":"chatcmpl-Ci0fwu2tok98UAxCLC0JqGv1C3xmf5BI","timings":{"prompt_n":292,"prompt_ms":184.796,"prompt_per_token_ms":0.6328630136986301,"prompt_per_second":1580.1207818351047,"predicted_n":43,"predicted_ms":806.196,"predicted_per_token_ms":18.748744186046512,"predicted_per_second":53.336905665619774}}

After b5478:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"<tool_call>\n{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"Istanbul, Turkey\"}}\n</tool_call>","tool_calls":[{"type":"function","function":{"name":"get_population","arguments":"{\"city\":\"Istanbul\"}"},"id":"O8dX7XA6BgsC3TZV7YE1nLXyroFNyVR3"}]}}],"created":1749553457,"model":"gpt-3.5-turbo","system_fingerprint":"b5621-2bb04670","object":"chat.completion","usage":{"completion_tokens":45,"prompt_tokens":292,"total_tokens":337},"id":"chatcmpl-Y7CIK8Xswaz74EVV24v9LP2T85GMnMPG","timings":{"prompt_n":292,"prompt_ms":175.339,"prompt_per_token_ms":0.6004760273972602,"prompt_per_second":1665.3454165929998,"predicted_n":45,"predicted_ms":841.967,"predicted_per_token_ms":18.710377777777776,"predicted_per_second":53.44627521031109}}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions