Error thrown by `chat_template` if multiple system messages (llama.cpp / Huggingface models)

Hi,

first of all, thank you for the awesome project — really cool!

**Describe the bug**
I'm currently experimenting with the new `@blocknote/xl-ai` package, using the [Vercel OpenAI-compatible provider](https://ai-sdk.dev/providers/openai-compatible-providers) to connect with a local [`llama.cpp` server](https://github.com/ggml-org/llama.cpp/tree/master/tools/server). I'm running smaller models from HuggingFace e. g. to summarize personal notes.

These HF-models typically include a `chat_template` in the `tokenizer_config.json`, which is used when converting to `.gguf` format during quantization. The default `chat_template` often expects alternating user/assistant messages (see examples below). Using default settings, this causes the `llama.cpp` server to raise an error (HTTP 500), when the request has multiple `system` role messages.

Examples:

- Tokenizer from [Mistral-7B](https://huggingface.co/unsloth/mistral-7b-instruct-v0.3-bnb-4bit/blob/main/tokenizer_config.json#L6176)
- GGUF from [Gemma-4B](https://huggingface.co/unsloth/gemma-3-4b-it-qat-GGUF/tree/main?show_file_info=gemma-3-4b-it-qat-Q4_0.gguf)

The error seems to originate from this section of the code:  
https://github.com/TypeCellOS/BlockNote/blob/main/packages/xl-ai/src/api/LLMRequest.ts#L168-L180

The comments around the code suggest the issue is known already.

I'm currently fiddling with a workaround via a custom fetch-function in the OpenAI-compatible provider. It modifies the request by merging all system messages into a single `system-message` before sending it to the `llama-server`.

Another workaround would be to start the `llama-server` with the flags `--jinja --chat-template chatml`, which seems to work — but I noticed that breaks compatibility with the default `llama.cpp` webUI due to BOS/EOS token issues.

**To Reproduce**
1. Download a GGUF model that includes a strict `chat_template`, such as [Gemma 3B](https://huggingface.co/unsloth/gemma-3-4b-it-qat-GGUF).
2. Start the `llama.cpp` server with:
   ```bash
   ./llama-server.exe --model gemma-3b.gguf --port 8000 --jinja --cache-reuse 256 --ctx-size 8192
   ```
3. Use the `@blocknote/xl-ai` package with the OpenAI-compatible provider `@ai-sdk/openai-compatible`.
4. Observe the 500 error from the server due to `chat_template` validation.

**Misc**
I'm not sure if other LLM engines such as:
- [vLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server)
- [SGLang](https://docs.sglang.ai/backend/openai_api_completions.html)

also run into this issue when using the default provided `tokenizer_config.json` or `chat_template`.

- Node version:
- Package manager:
- Browser:
- [ ] I'm a [sponsor](https://www.blocknotejs.org/pricing) and would appreciate if you could look into this sooner than later 💖


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Error thrown by `chat_template` if multiple system messages (llama.cpp / Huggingface models) #1783

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Error thrown by chat_template if multiple system messages (llama.cpp / Huggingface models) #1783

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Error thrown by `chat_template` if multiple system messages (llama.cpp / Huggingface models) #1783