llama.cpp CUDA Builds

This repository automatically builds llama.cpp with CUDA support for multiple NVIDIA GPU architectures and CUDA versions.

Why This Repository?

The official llama.cpp repository does not provide pre-built CUDA binaries. This repository fills that gap by:

Building llama.cpp with CUDA support for multiple CUDA toolkit versions
Supporting a wide range of NVIDIA GPU architectures (compute capability 7.5+)
Automatically tracking upstream llama.cpp releases
Providing ready-to-use binaries via GitHub releases

Supported Configurations

CUDA Versions

CUDA 12.4
CUDA 12.6
CUDA 12.8
CUDA 12.9
CUDA 13.0

GPU Architectures

Compute Capability	GPU Examples	CUDA 12.4/12.6	CUDA 12.8+
6.1	Titan XP, Tesla P40, GTX 10xx	✅	✅
7.0	Tesla V100	✅	✅
7.5	Tesla T4, RTX 2000 series, Quadro RTX	✅	✅
8.0	A100	✅	✅
8.6	RTX 3000 series	✅	✅
8.9	RTX 4000 series, L4, L40	✅	✅
9.0	H100, H200	✅	✅
10.0	B200	❌	✅
12.0	RTX Pro series, RTX 5000 series	❌	✅

Note: Compute Capability < 7.5 support ends with CUDA 12.9

Usage

Download

Go to the Releases page
Download the tarball for your CUDA version (e.g., llama.cpp-bXXXX-cuda-12.6.tar.gz)
Extract the archive:

tar -xzf llama.cpp-bXXXX-cuda-12.6.tar.gz
cd cuda-12.6

Run

The extracted directory contains all llama.cpp binaries:

# Run the main CLI
./llama-cli --help

# Run the server
./llama-server --help

# Other utilities
./llama-bench
./llama-quantize
./llama-embedding

Check Version

Each release includes a VERSION.txt file with build information:

cat VERSION.txt

System Requirements

NVIDIA GPU with compute capability 7.5 or higher
Appropriate NVIDIA driver for your CUDA version:
- CUDA 12.4: Driver >= 550.54
- CUDA 12.6: Driver >= 560.28
- CUDA 12.8: Driver >= 570.15
- CUDA 12.9: Driver >= 580.13
- CUDA 13.0: Driver >= 590.xx
Linux x86_64 (Ubuntu 22.04 compatible)

Build Process

Builds are triggered automatically:

Daily at 00:00 UTC
Only if a new llama.cpp release is detected
Can be manually triggered via GitHub Actions

Each build:

Checks for new llama.cpp releases
Clones llama.cpp at the exact release commit
Builds with CMake using CUDA Docker images
Packages binaries for each CUDA version
Creates a GitHub release with all build artifacts

Choosing Your CUDA Version

Select based on:

Your GPU architecture - Blackwell GPUs require CUDA 12.8+
Your installed CUDA toolkit - Match the version if possible
Your NVIDIA driver - Ensure your driver supports the CUDA version

If unsure, CUDA 12.6.3 offers the widest compatibility with modern GPUs (except Blackwell).

Manual Building

If you need a custom build:

git clone https://github.com/ai-dock/llama.cpp-cuda
cd llama.cpp-cuda

# Edit .github/workflows/build-cuda.yml to customize architectures or CUDA versions
# Then trigger a manual workflow run

License

This repository contains build scripts only. The llama.cpp binaries are subject to the llama.cpp MIT License.

Links

Upstream llama.cpp: https://github.com/ggml-org/llama.cpp
CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit
NVIDIA Driver Downloads: https://www.nvidia.com/download/index.aspx

Support

For issues with:

Build process or binaries: Open an issue in this repository
llama.cpp functionality: Open an issue in the upstream repository

Credits

llama.cpp by Georgi Gerganov and contributors
Built and maintained by ai-dock

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
docs		docs
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llama.cpp CUDA Builds

Why This Repository?

Supported Configurations

CUDA Versions

GPU Architectures

Usage

Download

Run

Check Version

System Requirements

Build Process

Choosing Your CUDA Version

Manual Building

License

Links

Support

Credits

About

Uh oh!

Releases 8

Packages

Languages

License

ai-dock/llama.cpp-cuda

Folders and files

Latest commit

History

Repository files navigation

llama.cpp CUDA Builds

Why This Repository?

Supported Configurations

CUDA Versions

GPU Architectures

Usage

Download

Run

Check Version

System Requirements

Build Process

Choosing Your CUDA Version

Manual Building

License

Links

Support

Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages