webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI #16618

ServeurpersoCom · 2025-10-16T17:17:13Z

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

Close #16597

ServeurpersoCom · 2025-10-17T08:29:47Z

I have to do a little cleaning, the patch was not merged properly on my side. -> draft

ServeurpersoCom · 2025-10-18T21:09:58Z

This PR is now clean, but it was developed after this one: #16562

allozaur · 2025-10-22T16:24:55Z

Alright, @ServeurpersoCom, let's move forward with this one after merging #16562 ;) Let me know when you've addressed the merge conflicts and I'll gladly review the code

ServeurpersoCom · 2025-10-22T16:45:12Z

For the tool call inspector, do you prefer having one spoiler block per tool call, or a single aggregated spoiler wrapping all tool calls in the message?

It's rebased/reworked now. I push --force :)

ServeurpersoCom · 2025-10-22T17:09:37Z

Feel free to dissect the architecture as deep as you want! Component boundaries, store coupling, service layering, anything that smells non-idiomatic.
Also, if we end up polishing this feature further, I’m thinking it could live in a dedicated module for cleaner boundaries ?

lib/
 └─ toolcalls/
     ├─ toolcall-service.ts
     ├─ toolcall-store.ts
     ├─ ToolCallBlock.svelte
     └─ ToolCallItem.svelte

ServeurpersoCom · 2025-10-22T17:25:39Z

And we could even imagine the architecture being reusable later : like having a small JavaScript execution module decoupled from the UI, so the model could actually interact with a JS thread it coded itself.
That would also cover, in a more generic way, the proposal from PR #13501 by @samolego but in this case, the model would generate and run its own JS tools. Done properly, it’s no more of a security risk than the HTML/JS preview you get in Hugging Face Chat or Claude!

ServeurpersoCom · 2025-10-22T17:37:30Z

Includes a very small optimization from the previous PR (scroll listener removal). It landed here intentionally :D

ServeurpersoCom · 2025-11-01T19:39:17Z

Testing :

Add this

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "simple_addition_tool",
        "description": "A dummy calculator tool used for testing multi-argument tool call streaming.",
        "parameters": {
          "type": "object",
          "properties": {
            "a": {
              "type": "number",
              "description": "The first number to add."
            },
            "b": {
              "type": "number",
              "description": "The second number to add."
            }
          },
          "required": ["a", "b"]
        }
      }
    }
  ]
}

Here :

And ask model :

allozaur

Just few cosmetics

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

ServeurpersoCom · 2025-11-13T19:11:22Z

Rebase / Format / Build

allozaur · 2025-11-14T20:10:01Z

@ServeurpersoCom please re-base & rebuild

…and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option

…el selector)

…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]>

ServeurpersoCom · 2025-11-14T20:51:19Z

@ServeurpersoCom please re-base & rebuild

rebased and rebuilt

…ersistence in chat UI (ggml-org#16618) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <[email protected]>

SadaleNet · 2025-11-17T15:21:14Z

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.

I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

ServeurpersoCom · 2025-11-17T15:28:32Z

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.

I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

ServeurpersoCom · 2025-11-17T15:42:39Z

https://github.com/user-attachments/assets/ac188e22-9bbf-48a0-8e12-f655ec5a4ecd
We’re working hard with Alek on the MCP client. Here’s what it does in dev

SadaleNet · 2025-11-17T15:42:50Z

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.
I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

ServeurpersoCom · 2025-11-17T15:50:53Z

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.
I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need.

SadaleNet · 2025-11-17T15:59:24Z

Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below.
I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?

It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model!

oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist.

Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need.

Oh sure. Tell me more. I just need a starting point. Particularly on how to intercept the model's tool call signal and return the appropriate result to the model while using the UI of llama-server. Just to confirm, would this mechanism work in the current master branch of llama.cpp?

As for MCP, if it's not absolutely required, I guess I can explore that on my own later. :P

Again, thanks a lot for working on this feature. People like me are highly thankful of your work.

EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now?

ServeurpersoCom · 2025-11-17T16:29:48Z

EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now?

Absolutely

SadaleNet · 2025-11-19T14:22:23Z

I've figure out how to get this feature to work and created a repo for that: https://github.com/SadaleNet/llamacpp-tool-calling-python

Again, thanks for your groundwork. My script wouldn't be working without your greater prior work.

ServeurpersoCom · 2025-11-19T16:18:28Z

I've figure out how to get this feature to work and created a repo for that: https://github.com/SadaleNet/llamacpp-tool-calling-python

Again, thanks for your groundwork. My script wouldn't be working without your greater prior work.

Cool !

You can try this one; it's integrated in TypeScript. Just edit the config.js file to configure your MCP server. It's still under development; for the time being, I've put my MCP server's URL directly into it (easy to find with git grep). :) It supports WebSocket and Streamable-HTTP, and configurable agentic loop (chain of toolcall)

https://github.com/ServeurpersoCom/llama.cpp/tree/mcp-client-alpha I add full UI settings soon.

I have also a full backend Node.js OAI-Compat reverse proxy version supporting stdio/ws/streamable-http transport (testing-branch18 and nexts). It's like llama-swap, but for adding tools to any OAI basic client (bot, "jarvis" style voice assistant, etc.)

Both handle the context well, they allow for long chains of complex development if you put an MCP server like the one I put as an example, it performs almost as well as Claude in computer use depending on the model.

ServeurpersoCom requested a review from allozaur as a code owner October 16, 2025 17:17

github-actions bot added examples server labels Oct 16, 2025

ServeurpersoCom mentioned this pull request Oct 16, 2025

Feature Request: Add a debug option to display OpenAI-Compatible toolcall chunks in the WebUI #16597

Closed

4 tasks

ServeurpersoCom marked this pull request as draft October 17, 2025 08:26

ServeurpersoCom marked this pull request as ready for review October 18, 2025 21:09

ServeurpersoCom marked this pull request as draft October 18, 2025 21:10

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch 2 times, most recently from 0fe776d to 02df5a1 Compare October 18, 2025 21:21

ServeurpersoCom marked this pull request as ready for review October 18, 2025 21:22

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from 02df5a1 to a5cff84 Compare October 22, 2025 17:00

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from aad02d8 to d2399e9 Compare October 31, 2025 12:09

DajanaV mentioned this pull request Oct 31, 2025

UPSTREAM PR #16618: webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI auroralabs-loci/llama.cpp#24

Closed

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from d2399e9 to 57e7100 Compare November 1, 2025 19:30

ServeurpersoCom mentioned this pull request Nov 3, 2025

common: Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) #16932

Merged

ServeurpersoCom closed this Nov 8, 2025

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from dc8ac21 to eeee367 Compare November 8, 2025 16:49

ServeurpersoCom reopened this Nov 8, 2025

DajanaV mentioned this pull request Nov 8, 2025

UPSTREAM PR #16618: webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI auroralabs-loci/llama.cpp#138

Closed

allozaur requested changes Nov 13, 2025

View reviewed changes

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Outdated Show resolved Hide resolved

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Outdated Show resolved Hide resolved

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from 7b1b1cc to 0ba18eb Compare November 13, 2025 19:10

allozaur approved these changes Nov 13, 2025

View reviewed changes

DajanaV mentioned this pull request Nov 13, 2025

UPSTREAM PR #16618: webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI auroralabs-loci/llama.cpp#200

Open

ServeurpersoCom and others added 5 commits November 14, 2025 21:32

webui: remove scroll listener causing unnecessary layout updates (mod…

a773caf

…el selector)

Update tools/server/webui/src/lib/components/app/chat/ChatMessages/Ch…

9f1cdb3

…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]>

Update tools/server/webui/src/lib/components/app/chat/ChatMessages/Ch…

291edb0

…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]>

chore: npm run format & update webui build output

73e4023

ServeurpersoCom force-pushed the harmony-toolcall-debug-option branch from 0ba18eb to 73e4023 Compare November 14, 2025 20:33

chore: update webui build output

b1b7ecf

allozaur merged commit 1411d92 into ggml-org:master Nov 15, 2025
14 checks passed

webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI #16618

webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI #16618

Conversation

ServeurpersoCom commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 17, 2025

Uh oh!

ServeurpersoCom commented Oct 18, 2025

Uh oh!

allozaur commented Oct 22, 2025

Uh oh!

ServeurpersoCom commented Oct 22, 2025

Uh oh!

ServeurpersoCom commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 22, 2025

Uh oh!

ServeurpersoCom commented Oct 22, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025

Testing :

Add this

Here :

And ask model :

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ServeurpersoCom commented Nov 13, 2025

Uh oh!

allozaur commented Nov 14, 2025

Uh oh!

ServeurpersoCom commented Nov 14, 2025

Uh oh!

Uh oh!

SadaleNet commented Nov 17, 2025

Uh oh!

ServeurpersoCom commented Nov 17, 2025

Uh oh!

ServeurpersoCom commented Nov 17, 2025

Uh oh!

SadaleNet commented Nov 17, 2025

Uh oh!

ServeurpersoCom commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SadaleNet commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Nov 17, 2025

Uh oh!

SadaleNet commented Nov 19, 2025

Uh oh!

ServeurpersoCom commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ServeurpersoCom commented Oct 16, 2025 •

edited

Loading

ServeurpersoCom commented Oct 22, 2025 •

edited

Loading

ServeurpersoCom commented Nov 17, 2025 •

edited

Loading

SadaleNet commented Nov 17, 2025 •

edited

Loading

ServeurpersoCom commented Nov 19, 2025 •

edited

Loading