-
Notifications
You must be signed in to change notification settings - Fork 13.8k
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI #16618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI #16618
Conversation
|
I have to do a little cleaning, the patch was not merged properly on my side. -> draft |
|
This PR is now clean, but it was developed after this one: #16562 |
0fe776d to
02df5a1
Compare
|
Alright, @ServeurpersoCom, let's move forward with this one after merging #16562 ;) Let me know when you've addressed the merge conflicts and I'll gladly review the code |
02df5a1 to
a5cff84
Compare
|
Feel free to dissect the architecture as deep as you want! Component boundaries, store coupling, service layering, anything that smells non-idiomatic. |
|
And we could even imagine the architecture being reusable later : like having a small JavaScript execution module decoupled from the UI, so the model could actually interact with a JS thread it coded itself. |
|
Includes a very small optimization from the previous PR (scroll listener removal). It landed here intentionally :D |
aad02d8 to
d2399e9
Compare
d2399e9 to
57e7100
Compare
dc8ac21 to
eeee367
Compare
allozaur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few cosmetics
tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Outdated
Show resolved
Hide resolved
tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Outdated
Show resolved
Hide resolved
7b1b1cc to
0ba18eb
Compare
|
Rebase / Format / Build |
|
@ServeurpersoCom please re-base & rebuild |
…and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option
…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]>
…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]>
0ba18eb to
73e4023
Compare
rebased and rebuilt |
…ersistence in chat UI (ggml-org#16618) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <[email protected]>
|
Hello @ServeurpersoCom. Thank you for implementation tool-calling. I'm highly interested in it. I was trying to reproduce your 2+2 calculation example. Unfortunately, the tool calling part wasn't working as shown below. I wonder how I could attach a script to perform the actual computation. Do I need to use a browser script to detect the presence of the tool calling clipboard copy button and send the tool calling response to an API. Or if there's an API endpoint that my script/program need to listen for tool calling events so that my script can perform the computation?
|
It's a tools_calls debugger-only, and it works. On your screenshot you get a "simple_addition_tool" tag under the (empty) assistant message. Hover or click to read the function written by your model! |
|
https://github.com/user-attachments/assets/ac188e22-9bbf-48a0-8e12-f655ec5a4ecd |
oh. ok. So, at this point, is there any way that I could actually execute the calculation function with the UI provided by llama-server? I've got a CLI program working that can do the calculation work using the OpenAi-compatible API of llama-server. I don't want to reinvent the UI if it already exist. |
Yes you don’t need to reinvent the wheel. But the UI is still in development, and it’s a heavy piece of work. That’s exactly why MCP exists: the model emits tool calls, you wrap them, and you send them to an MCP server that returns the result into the context. If you’re comfortable with sysadmin work, I can give you what you need. |
Oh sure. Tell me more. I just need a starting point. Particularly on how to intercept the model's tool call signal and return the appropriate result to the model while using the UI of llama-server. Just to confirm, would this mechanism work in the current master branch of llama.cpp? As for MCP, if it's not absolutely required, I guess I can explore that on my own later. :P Again, thanks a lot for working on this feature. People like me are highly thankful of your work. EDIT: Oh wait. Did you actually mean that the UI isn't ready and I have to use other methods to get the tool-calling mechanism working for now? |
Absolutely |
|
I've figure out how to get this feature to work and created a repo for that: https://github.com/SadaleNet/llamacpp-tool-calling-python Again, thanks for your groundwork. My script wouldn't be working without your greater prior work. |
Cool ! You can try this one; it's integrated in TypeScript. Just edit the config.js file to configure your MCP server. It's still under development; for the time being, I've put my MCP server's URL directly into it (easy to find with https://github.com/ServeurpersoCom/llama.cpp/tree/mcp-client-alpha I add full UI settings soon. I have also a full backend Node.js OAI-Compat reverse proxy version supporting stdio/ws/streamable-http transport (testing-branch18 and nexts). It's like llama-swap, but for adding tools to any OAI basic client (bot, "jarvis" style voice assistant, etc.) Both handle the context well, they allow for long chains of complex development if you put an MCP server like the one I put as an example, it performs almost as well as Claude in computer use depending on the model. |






Close #16597