This repository automatically builds llama.cpp with CUDA support for multiple NVIDIA GPU architectures and CUDA versions.
The official llama.cpp repository does not provide pre-built CUDA binaries. This repository fills that gap by:
- Building llama.cpp with CUDA support for multiple CUDA toolkit versions
- Supporting a wide range of NVIDIA GPU architectures (compute capability 7.5+)
- Automatically tracking upstream llama.cpp releases
- Providing ready-to-use binaries via GitHub releases
- CUDA 12.4
- CUDA 12.6
- CUDA 12.8
- CUDA 12.9
- CUDA 13.0
| Compute Capability | GPU Examples | CUDA 12.4/12.6 | CUDA 12.8+ |
|---|---|---|---|
| 6.1 | Titan XP, Tesla P40, GTX 10xx | ✅ | ✅ |
| 7.0 | Tesla V100 | ✅ | ✅ |
| 7.5 | Tesla T4, RTX 2000 series, Quadro RTX | ✅ | ✅ |
| 8.0 | A100 | ✅ | ✅ |
| 8.6 | RTX 3000 series | ✅ | ✅ |
| 8.9 | RTX 4000 series, L4, L40 | ✅ | ✅ |
| 9.0 | H100, H200 | ✅ | ✅ |
| 10.0 | B200 | ❌ | ✅ |
| 12.0 | RTX Pro series, RTX 5000 series | ❌ | ✅ |
Note: Compute Capability < 7.5 support ends with CUDA 12.9
- Go to the Releases page
- Download the tarball for your CUDA version (e.g.,
llama.cpp-bXXXX-cuda-12.6.tar.gz) - Extract the archive:
tar -xzf llama.cpp-bXXXX-cuda-12.6.tar.gz
cd cuda-12.6The extracted directory contains all llama.cpp binaries:
# Run the main CLI
./llama-cli --help
# Run the server
./llama-server --help
# Other utilities
./llama-bench
./llama-quantize
./llama-embeddingEach release includes a VERSION.txt file with build information:
cat VERSION.txt- NVIDIA GPU with compute capability 7.5 or higher
- Appropriate NVIDIA driver for your CUDA version:
- CUDA 12.4: Driver >= 550.54
- CUDA 12.6: Driver >= 560.28
- CUDA 12.8: Driver >= 570.15
- CUDA 12.9: Driver >= 580.13
- CUDA 13.0: Driver >= 590.xx
- Linux x86_64 (Ubuntu 22.04 compatible)
Builds are triggered automatically:
- Daily at 00:00 UTC
- Only if a new llama.cpp release is detected
- Can be manually triggered via GitHub Actions
Each build:
- Checks for new llama.cpp releases
- Clones llama.cpp at the exact release commit
- Builds with CMake using CUDA Docker images
- Packages binaries for each CUDA version
- Creates a GitHub release with all build artifacts
Select based on:
- Your GPU architecture - Blackwell GPUs require CUDA 12.8+
- Your installed CUDA toolkit - Match the version if possible
- Your NVIDIA driver - Ensure your driver supports the CUDA version
If unsure, CUDA 12.6.3 offers the widest compatibility with modern GPUs (except Blackwell).
If you need a custom build:
git clone https://github.com/ai-dock/llama.cpp-cuda
cd llama.cpp-cuda
# Edit .github/workflows/build-cuda.yml to customize architectures or CUDA versions
# Then trigger a manual workflow runThis repository contains build scripts only. The llama.cpp binaries are subject to the llama.cpp MIT License.
- Upstream llama.cpp: https://github.com/ggml-org/llama.cpp
- CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit
- NVIDIA Driver Downloads: https://www.nvidia.com/download/index.aspx
For issues with:
- Build process or binaries: Open an issue in this repository
- llama.cpp functionality: Open an issue in the upstream repository