Home

About

These guides are intended to help you get started quickly with llama-swap configuration snippets.

Tip

Looking for help with the Configuration? It was written to be LLM friendly. Try copy/pasting the example config into an LLM first to see if it can answer your question.

Model Guides

Company	Model	VRAM Requirement	Server	Notes	Link
BGE	reranker v2 m3	343MB	llama.cpp	`v1/rerank` API llama-server	link
Google	Gemma 3 27B	24GB to 27GB	llama.cpp	100K context on single and dual 24GB GPUs	link
Meta	llama-3.3-70B	55GB	llama.cpp	13 to 20 tok/sec with 2x3090 and P40 for speculative decoding	link
Meta	llama4-scout	68.62 GB	llama.cpp	Fully loading scout with 62K context onto 3x24GB GPUs	link
Mistral	Small 3.1	24GB	llama.cpp	text and vision support, 32K context	link
Nomic-AI	nomic-embed-text v1.5	280MB	llama.cpp	`v1/embeddings` with llama-server	link
OpenAI	whisper-large-v3-turbo	1.4GB	whisper.cpp	`v1/audio/speech` text to speech with whisper.cpp	link
Qwen	qwen3-30b-a3b	24 GB	llama.cpp	113tok/s on a 3090	link
Qwen	QwQ, Coder 32B	24 GB to 48GB	llama.cpp	Local copilot with Aider, QwQ and Qwen2.5 Coder 32B	link

Community Contributed Guides

Note

These guides are contributed by community members and have not been verified by llama-swap maintainers.

Contributor	Guide
@WesleyFister	docker-compose example with built in config.yaml
@ramblingcoder	Docker in docker setup with llama-swap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

About

Model Guides

Community Contributed Guides

Use Case Guides

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally