sync : ggml #2962

ggerganov · 2025-03-28T18:59:01Z

No description provided.

…2600) * opencl: add `im2col` * opencl: add `gelu_quick` * opencl: add mrope * opencl: add vision rope

…/12496) * rpc : send hash when tensor data is above some fixed threshold ref #10095 * rpc : put cache under $HOME/.cache/llama.cpp * try to fix win32 build * another try to fix win32 build * remove llama as dependency

This patch enables usage of MMA when one of the dimensions of the matrix(ie either M or N) is 1. This is useful in case of token generation where N < 2. The concept of 'GEMV Forwarding' is used where when one of the matrix has a single row/column, the elements are broadcasted, instead of using packing routine to prepack the matrix elements. This change results in 5% - 15% improvement in total speed(ie all tokens/total time), across various batch sizes. This is in comparision with the corresponding dot product implementation. The patch is tested with FP32 models of Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>

* vulkan: fix coopmat shader generation when cross-compiling Previously the status of coopmat{,2} support isn't passed to the vulkan-shaders-gen project building on the host, which leads to build failure because of the cross-compiling code expecting coopmat{,2} shaders that didn't get generated. Fix this by passing the coopmat{,2} support status to vulkan-shaders subproject. Signed-off-by: Icenowy Zheng <[email protected]> * Only call coop-mat shaders once * Fix whitespace --------- Signed-off-by: Icenowy Zheng <[email protected]> Co-authored-by: bandoti <[email protected]>

* ggml : FA with different K, V head sizes (CPU) ggml-ci * metal : add FA with HS=192 * metal : extend FA to support different K and V head sizes ggml-ci * metal : add FA vector kernels for heads K 192 and V 128 ggml-ci * ggml : restrict op on other backends to equal head sizes ggml-ci * metal : optimize FA-vec kernel ggml-ci * metal : FA remove mq registers * metal : improve MoE mul_mat_id condition ggml-ci * metal : fix comments + remove unnecessary addition ggml-ci * metal : avoid too much shared memory usage with mul_mat_id ggml-ci

ggml-ci

lhez and others added 6 commits March 28, 2025 20:58

opencl: add multi and vision rope, gelu_quick and im2col (llama/1…

263a588

…2600) * opencl: add `im2col` * opencl: add `gelu_quick` * opencl: add mrope * opencl: add vision rope

sync : ggml

8ba897b

ggml-ci

ggerganov merged commit 6e7629b into master Mar 28, 2025
57 checks passed

ggerganov deleted the sync-ggml-25-03-28 branch March 28, 2025 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync : ggml #2962

sync : ggml #2962

Uh oh!

ggerganov commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sync : ggml #2962

sync : ggml #2962

Uh oh!

Conversation

ggerganov commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants