Skip to content

ai-dock/llama.cpp-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llama.cpp CUDA Builds

This repository automatically builds llama.cpp with CUDA support for multiple NVIDIA GPU architectures and CUDA versions.

Why This Repository?

The official llama.cpp repository does not provide pre-built CUDA binaries. This repository fills that gap by:

  • Building llama.cpp with CUDA support for multiple CUDA toolkit versions
  • Supporting a wide range of NVIDIA GPU architectures (compute capability 7.5+)
  • Automatically tracking upstream llama.cpp releases
  • Providing ready-to-use binaries via GitHub releases

Supported Configurations

CUDA Versions

  • CUDA 12.4
  • CUDA 12.6
  • CUDA 12.8
  • CUDA 12.9
  • CUDA 13.0

GPU Architectures

Compute Capability GPU Examples CUDA 12.4/12.6 CUDA 12.8+
6.1 Titan XP, Tesla P40, GTX 10xx
7.0 Tesla V100
7.5 Tesla T4, RTX 2000 series, Quadro RTX
8.0 A100
8.6 RTX 3000 series
8.9 RTX 4000 series, L4, L40
9.0 H100, H200
10.0 B200
12.0 RTX Pro series, RTX 5000 series

Note: Compute Capability < 7.5 support ends with CUDA 12.9

Usage

Download

  1. Go to the Releases page
  2. Download the tarball for your CUDA version (e.g., llama.cpp-bXXXX-cuda-12.6.tar.gz)
  3. Extract the archive:
tar -xzf llama.cpp-bXXXX-cuda-12.6.tar.gz
cd cuda-12.6

Run

The extracted directory contains all llama.cpp binaries:

# Run the main CLI
./llama-cli --help

# Run the server
./llama-server --help

# Other utilities
./llama-bench
./llama-quantize
./llama-embedding

Check Version

Each release includes a VERSION.txt file with build information:

cat VERSION.txt

System Requirements

  • NVIDIA GPU with compute capability 7.5 or higher
  • Appropriate NVIDIA driver for your CUDA version:
    • CUDA 12.4: Driver >= 550.54
    • CUDA 12.6: Driver >= 560.28
    • CUDA 12.8: Driver >= 570.15
    • CUDA 12.9: Driver >= 580.13
    • CUDA 13.0: Driver >= 590.xx
  • Linux x86_64 (Ubuntu 22.04 compatible)

Build Process

Builds are triggered automatically:

  • Daily at 00:00 UTC
  • Only if a new llama.cpp release is detected
  • Can be manually triggered via GitHub Actions

Each build:

  1. Checks for new llama.cpp releases
  2. Clones llama.cpp at the exact release commit
  3. Builds with CMake using CUDA Docker images
  4. Packages binaries for each CUDA version
  5. Creates a GitHub release with all build artifacts

Choosing Your CUDA Version

Select based on:

  1. Your GPU architecture - Blackwell GPUs require CUDA 12.8+
  2. Your installed CUDA toolkit - Match the version if possible
  3. Your NVIDIA driver - Ensure your driver supports the CUDA version

If unsure, CUDA 12.6.3 offers the widest compatibility with modern GPUs (except Blackwell).

Manual Building

If you need a custom build:

git clone https://github.com/ai-dock/llama.cpp-cuda
cd llama.cpp-cuda

# Edit .github/workflows/build-cuda.yml to customize architectures or CUDA versions
# Then trigger a manual workflow run

License

This repository contains build scripts only. The llama.cpp binaries are subject to the llama.cpp MIT License.

Links

Support

For issues with:

  • Build process or binaries: Open an issue in this repository
  • llama.cpp functionality: Open an issue in the upstream repository

Credits

About

Pre-built llama.cpp binaries with CUDA support for multiple GPU architectures

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages