Linux

Setup llama.cpp server for Linux

Code completion server

Used for
- code completion

LLM type
- FIM (fill in the middle)

Instructions

Download the release files for your OS from llama.cpp releases. (or build from source).
Download the LLM model and run llama.cpp server (combined in one command)

CPU only

llama-server -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF --port 8012 -ub 512 -b 512 --ctx-size 0 --cache-reuse 256

With Nvidia GPUs and installed cuda drivers

more than 16GB VRAM
llama-server --fim-qwen-7b-default
less than 16GB VRAM
llama-server --fim-qwen-3b-default
less than 8GB VRAM
llama-server --fim-qwen-1.5b-default
If the file is not available (first time) it will be downloaded (this could take some time) and after that llama.cpp server will be started.

Chat server

Used for
- Chat with AI
- Chat with AI with project context
- Edit with AI
- Generage commit message

LLM type
- Chat Models

Instructions
Same like code completion server, but use chat model and a little bit different parameters.

CPU-only:
llama-server -hf ggml-org/Qwen2.5-Coder-1.5B-Instruct-Q8_0-GGUF --port 8011

With Nvidia GPUs and installed cuda drivers

more than 16GB VRAM
llama-server -hf ggml-org/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF --port 8011
less than 16GB VRAM
llama-server -hf ggml-org/Qwen2.5-Coder-3B-Instruct-Q8_0-GGUF --port 8011
less than 8GB VRAM
llama-server -hf ggml-org/Qwen2.5-Coder-1.5B-Instruct-Q8_0-GGUF --port 8011

Embeddings server

Used for
- Chat with AI with project context

LLM type
- Embedding

Instructions
Same like code completion server, but use embeddings model and a little bit different parameters.
llama-server -hf ggml-org/Nomic-Embed-Text-V2-GGUF --port 8010 -ub 2048 -b 2048 --ctx-size 2048 --embeddings

Linux

Setup llama.cpp server for Linux

Code completion server

Chat server

Embeddings server

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally