Skip to content

Benchmark AMD GPUs on Raspberry Pi 5 #1

@geerlingguy

Description

@geerlingguy

To get this to work, first you have to get an external AMD GPU working on Pi OS. The most up-to-date instructions are currently on my website: Get an AMD Radeon 6000/7000-series GPU running on Pi 5.

Once your AMD graphics card is working (and can output video), install dependencies and compile llama.cpp with the Vulkan backend:

# Install Vulkan SDK, glslc, and cmake
sudo apt install -y libvulkan-dev glslc cmake

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build with Vulkan
cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release
# Test the output binary (with "-ngl 33" to offload all layers to GPU)
./build/bin/llama-cli -m "PATH_TO_MODEL" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4

# You should see in the output, ggml_vulkan detected your GPU. For example:
# ggml_vulkan: Found 1 Vulkan devices:
# ggml_vulkan: 0 = AMD Radeon RX 6700 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 64

Then you can download a model (e.g. off HuggingFace) and run it:

# Download llama3.2:3b
cd models && wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf

# Run it.
cd ../
./build/bin/llama-cli -m "models/Llama-3.2-3B-Instruct-Q4_K_M.gguf" -p "Why is the blue sky blue?" -n 50 -e -ngl 33 -t 4

I want to test with models:

Device CPU/GPU Model Speed Power (Peak)
Pi 5 - 8GB CPU llama3.2:3b 4.61 Tokens/s 13.9 W
Pi 5 - 8GB CPU llama3.2:8b 1.99 Tokens/s 13.2 W
Pi 5 - 8GB CPU llama2:13b DNF DNF
Pi 5 - 8GB / AMD RX 6500 XT 8GB GPU llama3.2:3b 39.82 Tokens/s 88 W
Pi 5 - 8GB / AMD RX 6500 XT 8GB GPU llama3.1:8b 22.42 Tokens/s 95.7 W
Pi 5 - 8GB / AMD RX 6500 XT 8GB GPU llama2:13b 2.03 Tokens/s 48.3 W
Pi 5 - 8GB / AMD RX 6700 XT 12GB GPU llama3.2:3b 49.01 Tokens/s 94 W
Pi 5 - 8GB / AMD RX 6700 XT 12GB GPU llama3.1:8b 39.70 Tokens/s 135 W
Pi 5 - 8GB / AMD RX 6700 XT 12GB GPU llama2:13b 3.98 Tokens/s 95 W
Pi 5 - 8GB / AMD RX 7600 8GB GPU llama3.2:3b 48.47 Tokens/s 156 W
Pi 5 - 8GB / AMD RX 7600 8GB GPU llama3.1:8b 32.60 Tokens/s 174 W
Pi 5 - 8GB / AMD RX 7600 8GB GPU llama2:13b 2.42 Tokens/s 106 W
Pi 5 - 8GB / AMD Radeon Pro W7700 16GB GPU llama3.2:3b 56.14 Tokens/s 145 W
Pi 5 - 8GB / AMD Radeon Pro W7700 16GB GPU llama3.1:8b 39.87 Tokens/s 52 W
Pi 5 - 8GB / AMD Radeon Pro W7700 16GB GPU llama2:13b 4.38 Tokens/s 108 W

Note: Ollama currently doesn't support Vulkan, and some parts of llama.cpp assume x86 still, not Arm or RISC-V.

Note 2: With larger models, you may run into an error like vk::Device::allocateMemory: ErrorOutOfDeviceMemory—see bug Vulkan Device memory allocation failed. If so, try scaling back to 1 or 2 GB of RAM for the buffer:

export GGML_VK_FORCE_MAX_ALLOCATION_SIZE=2147483647  # 2GB buffer
export GGML_VK_FORCE_MAX_ALLOCATION_SIZE=1073741824  # 1GB buffer

Note 3: Power consumption measured at the wall (total system power draw) using a ThirdReality Zigbee Smart Outlet through Home Assistant. I don't have a way of measuring total energy consumed per test (e.g. Joules) but that would be nice at some point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions