-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Description
Name and Version
$ ./build/bin/llama-cli --version
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M3)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M3)
version: 4830 (d78b4df0)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin23.6.0
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
Other (Please specify in the next section)
Command line
Problem description & steps to reproduce
When running the LlamaSwiftUI example in a simulator and trying to load a model, the following error occurs:
ggml_metal_init: loaded kernel_ssm_conv_f32 0x0 | th_max = 0 | th_width = 0
ggml_metal_init: error: load pipeline error: Error Domain=CompilerError Code=2 "only 14 constant buffers binding are supported in the simulator but 16 were used" UserInfo={NSLocalizedDescription=only 14 constant buffers binding are supported in the simulator but 16 were used}
ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize Metal backend
Could not load context!
Error: The operation couldn’t be completed. (llama_swiftui.LlamaError error 0.)This seems to be coming from the kernel_ssm_conv_f32 kernel function:
// ref: ggml.c:ggml_compute_forward_ssm_conv_f32
// TODO: optimize
kernel void kernel_ssm_conv_f32(
device const void * src0,
device const void * src1,
device float * dst,
constant int64_t & ne00,
constant int64_t & ne01,
constant int64_t & ne02,
constant uint64_t & nb00,
constant uint64_t & nb01,
constant uint64_t & nb02,
constant int64_t & ne10,
constant int64_t & ne11,
constant uint64_t & nb10,
constant uint64_t & nb11,
constant int64_t & ne0,
constant int64_t & ne1,
constant int64_t & ne2,
constant uint64_t & nb0,
constant uint64_t & nb1,
constant uint64_t & nb2,
uint3 tgpig[[threadgroup_position_in_grid]],
uint3 tpitg[[thread_position_in_threadgroup]],
uint3 ntg[[threads_per_threadgroup]]) {This kernel function has 18 parameters and 16 are constant buffers. But the limit for
the simulator seems to be 14. I tried simply adding structs for the kernel functions that it complained about and this allowed the example to run (load a model and perform inference).
There is already an open issue and ongoing work to create these structs. I'm just opening this issue for awareness/tracking.
First Bad Commit
No response