1- ## experimental support of Ryzen 7x40 (Linux)
1+ # experimental support of Ryzen 7x40 (Linux)
22in my case a 7940HS with 64Go of RAM.with fedora:41/rocm-hip:6.2.1
33
44The backend only add mulmat(bf16) support OP with hip (no use for rocblas.) There is no limit on RAM usage (GTT/VRAM) weight are allocate on RAM.
@@ -17,113 +17,116 @@ build/igpu/bin/llama-cli --color -ngl 999 --no-mmap -ctk bf16 -ctv bf16 -m Meta-
1717
1818to be fare there is some aleatory crache with 'MES' error, may need some correction on AMD firmware
1919
20+ 01/03/2025: 1er version of kernel (V1) (support only BF16 quantisation)
21+ 14/03/2025: create a new kernel (V2) (support only BF16 quantisation)
22+
23+ Note: V2 kernel have a special kernel for gemv (ie token generation)
24+ Next:
25+ - adapte V1 kernel for small prompt processing (2-32?)
26+ - create kernel for FP8 and support optional conversion of weight (FP16/BF16/FP32) to BFP on load.
27+ - Add FP16 quantisation support
28+ - create true block kernel for CPU ("blis" like)?
29+
2030Some result (when it not crash):
2131
32+ ## Llama-3.2-1B-Instruct/BF16.gguf
33+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
34+ | ----------------- | ---------: | -------: | -----: | -----: | ------: | -------------: | --------------: | ---------------: |
35+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp1 | 23.26 ± 0.02 | 18.53 ± 0.17 | 27.59 ± 0.05 |
36+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp2 | 45.39 ± 0.04 | 36.20 ± 0.33 | 34.22 ± 0.03 |
37+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp4 | 90.47 ± 0.06 | 71.78 ± 0.22 | 65.12 ± 0.23 |
38+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp8 | 176.86 ± 2.93 | 139.26 ± 1.73 | 119.79 ± 0.08 |
39+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp16 | 344.33 ± 0.26 | 266.42 ± 3.15 | 200.51 ± 0.99 |
40+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp32 | 562.30 ± 9.50 | 422.50 ± 2.38 | 429.52 ± 0.68 |
41+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp48 | 665.70 ± 9.38 | 653.25 ± 1.98 | 601.83 ± 2.96 |
42+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp64 | 679.13 ± 8.38 | 717.96 ± 4.81 | 760.94 ± 0.32 |
43+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp128 | 723.15 ± 3.93 | 990.37 ± 1.74 | 1062.69 ± 2.78 |
44+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp192 | 738.65 ± 3.53 | 1131.50 ± 6.55 | 1304.20 ± 1.37 |
45+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp256 | 746.87 ± 2.49 | 1151.29 ± 7.71 | 1326.96 ± 2.51 |
46+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp384 | 714.54 ± 5.95 | 1178.65 ± 1.41 | 1220.25 ± 3.79 |
47+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp512 | 677.09 ± 2.49 | 963.16 ± 0.77 | 950.69 ± 1.97 |
48+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | pp768 | 665.30 ± 1.35 | 901.93 ± 1.94 | 884.07 ± 1.66 |
49+ | llama 1B BF16 | 2.30 GiB | 1.24 B | bf16 | bf16 | tg16 | 23.00 ± 0.10 | 18.26 ± 0.04 | 27.69 ± 0.08 |
50+
51+
52+ ## Llama-3.2-3B-Instruct/BF16.gguf
53+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
54+ | --------------- | ---------: | -------: | -----: | -----: | -------: | --------------: | --------------: | --------------: |
55+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp1 | 8.94 ± 0.07 | 7.85 ± 0.05 | 11.03 ± 0.02 |
56+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp2 | 17.56 ± 0.15 | 15.67 ± 0.04 | 14.61 ± 0.01 |
57+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp4 | 35.02 ± 0.25 | 31.11 ± 0.29 | 27.86 ± 0.01 |
58+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp8 | 69.18 ± 0.52 | 61.01 ± 0.17 | 51.21 ± 0.03 |
59+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp16 | 131.72 ± 1.09 | 117.77 ± 0.26 | 86.80 ± 0.16 |
60+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp32 | 209.28 ± 3.42 | 185.05 ± 0.90 | 178.08 ± 0.27 |
61+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp48 | 232.70 ± 3.41 | 273.60 ± 0.66 | 249.61 ± 0.25 |
62+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp64 | 237.90 ± 3.47 | 300.62 ± 0.68 | 313.17 ± 0.33 |
63+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp128 | 261.37 ± 4.04 | 390.84 ± 0.55 | 438.12 ± 0.16 |
64+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp192 | 263.82 ± 0.60 | 445.00 ± 1.70 | 506.12 ± 0.62 |
65+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp256 | 265.27 ± 0.71 | 450.11 ± 5.35 | 516.21 ± 7.91 |
66+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp384 | 261.27 ± 0.61 | 470.54 ± 0.32 | 485.27 ± 1.81 |
67+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp512 | 254.72 ± 0.25 | 441.51 ± 2.56 | 480.40 ± 0.19 |
68+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | pp768 | 253.87 ± 0.41 | 429.79 ± 0.43 | 462.86 ± 0.30 |
69+ | llama 3B BF16 | 5.98 GiB | 3.21 B | bf16 | bf16 | tg16 | 8.90 ± 0.03 | 7.85 ± 0.02 | 11.02 ± 0.00 |
70+
71+
2272## Meta-Llama-3.1-8B-Instruct/BF16.gguf
23- ### ref CPU:
24-
25- | model | size | params | backend | threads | test | t/s |
26- | ----------------| ---------: | ---------: | ---------- | ------: | -----: | ------------: |
27- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp1 | 3.84 ± 0.01 |
28- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp2 | 7.51 ± 0.04 |
29- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp4 | 14.96 ± 0.05 |
30- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp8 | 29.62 ± 0.11 |
31- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp16 | 56.31 ± 0.16 |
32- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp32 | 80.67 ± 1.50 |
33- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp64 | 83.32 ± 0.66 |
34- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp128 | 93.48 ± 1.13 |
35- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp256 | 102.99 ± 0.22 |
36- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | pp512 | 99.32 ± 0.57 |
37- | llama 8B BF16 | 14.96 GiB | 8.03 B | CPU | 8 | tg16 | 3.84 ± 0.01 |
38-
39- ### IGPU:
40-
41- | model | size | params | backend | ngl | type_kv | test | t/s |
42- | ---------------- | ---------: | ------: | ------- | --: | ------: | -----: | ------------: |
43- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp1 | 3.84 ± 0.01 |
44- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp2 | 7.57 ± 0.07 |
45- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp4 | 15.15 ± 0.07 |
46- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp8 | 29.95 ± 0.13 |
47- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp16 | 57.99 ± 0.18 |
48- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp32 | 88.34 ± 0.29 |
49- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp64 | 132.74 ± 0.69 |
50- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp128 | 152.85 ± 0.94 |
51- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp256 | 182.64 ± 7.56 |
52- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp384 | 201.33 ± 1.37 |
53- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | pp512 | 191.93 ± 1.26 |
54- | llama 8B BF16 | 14.96 GiB | 8.03 B | IGPU | 999 | bf16 | tg16 | 3.81 ± 0.01 |
73+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
74+ | --------------- | ---------: | -------: | -----: | -----: | -----: | -------------: | -------------: | -------------: |
75+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp1 | 3.88 ± 0.01 | 3.88 ± 0.01 | 4.88 ± 0.00 |
76+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp2 | 7.59 ± 0.00 | 7.74 ± 0.01 | 7.40 ± 0.01 |
77+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp4 | 15.04 ± 0.06 | 15.43 ± 0.11 | 14.20 ± 0.03 |
78+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp8 | 29.73 ± 0.13 | 30.23 ± 0.08 | 26.37 ± 0.02 |
79+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp16 | 56.55 ± 0.27 | 58.55 ± 0.53 | 45.95 ± 0.04 |
80+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp32 | 84.81 ± 0.90 | 91.54 ± 0.28 | 83.38 ± 0.01 |
81+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp48 | 90.43 ± 1.76 | 114.77 ± 0.42 | 116.55 ± 0.09 |
82+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp64 | 85.45 ± 0.71 | 137.17 ± 0.31 | 139.46 ± 1.12 |
83+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp128 | 103.68 ± 0.13 | 152.59 ± 1.25 | 195.33 ± 0.22 |
84+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp192 | 107.07 ± 0.18 | 183.30 ± 0.56 | 215.62 ± 0.93 |
85+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp256 | 107.43 ± 0.28 | 185.74 ± 1.14 | 235.19 ± 0.86 |
86+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp384 | 106.74 ± 0.11 | 213.56 ± 1.07 | 230.65 ± 0.09 |
87+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp512 | 104.39 ± 0.17 | 203.01 ± 0.39 | 232.16 ± 0.25 |
88+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | pp768 | 104.19 ± 0.10 | 194.98 ± 0.57 | 225.46 ± 0.40 |
89+ | llama 8B BF16 | 14.96 GiB | 8.03 B | bf16 | bf16 | tg16 | 3.88 ± 0.01 | 3.88 ± 0.01 | 4.87 ± 0.01 |
5590
5691
5792## Mistral-Nemo-Instruct-2407/BF16.gguf
58- ### ref CPU:
59-
60- | model | size | params | backend | threads | test | t/s |
61- | --------------- | ---------: | ------: | ------- | ------: | -----: | -----------: |
62- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp1 | 1.81 ± 0.00 |
63- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp2 | 3.56 ± 0.01 |
64- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp4 | 7.04 ± 0.06 |
65- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp8 | 14.10 ± 0.08 |
66- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp16 | 27.19 ± 0.17 |
67- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp32 | 42.20 ± 0.54 |
68- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp64 | 52.12 ± 0.29 |
69- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp128 | 61.92 ± 0.19 |
70- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp256 | 62.39 ± 0.16 |
71- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp384 | 63.36 ± 0.29 |
72- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | pp512 | 62.51 ± 0.02 |
73- | llama 13B BF16 | 22.81 GiB | 12.25 B | CPU | 8 | tg16 | 1.81 ± 0.00 |
74-
75- ### IGPU:
76-
77- | model | size | params | backend | ngl | type_kv | test | t/s |
78- | --------------- | ---------: | ------: | ------- | --: | ------: | ----: | ------------: |
79- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp1 | 2.70 ± 0.01 |
80- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp2 | 5.35 ± 0.01 |
81- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp4 | 10.62 ± 0.06 |
82- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp8 | 21.06 ± 0.05 |
83- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp16 | 40.59 ± 0.41 |
84- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp32 | 62.93 ± 0.17 |
85- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp64 | 92.03 ± 0.20 |
86- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp128 | 101.68 ± 0.24 |
87- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp256 | 122.88 ± 0.68 |
88- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp384 | 136.23 ± 0.28 |
89- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | pp512 | 124.70 ± 5.07 |
90- | llama 13B BF16 | 22.81 GiB | 12.25 B | IGPU | 999 | bf16 | tg16 | 2.69 ± 0.00 |
93+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
94+ | --------------- | ---------: | -------: | -----: | -----: | ------: | ------------: | -------------: | -------------: |
95+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp1 | 2.52 ± 0.00 | 2.76 ± 0.00 | 3.16 ± 0.01 |
96+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp2 | 4.94 ± 0.01 | 5.49 ± 0.01 | 4.90 ± 0.00 |
97+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp4 | 9.82 ± 0.03 | 10.92 ± 0.01 | 9.42 ± 0.06 |
98+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp8 | 19.40 ± 0.09 | 21.60 ± 0.02 | 17.56 ± 0.01 |
99+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp16 | 36.85 ± 0.35 | 42.03 ± 0.04 | 30.77 ± 0.06 |
100+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp32 | 50.40 ± 0.97 | 65.33 ± 0.09 | 56.43 ± 0.12 |
101+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp48 | 52.77 ± 1.60 | 77.46 ± 0.12 | 76.93 ± 0.22 |
102+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp64 | 54.65 ± 0.30 | 94.48 ± 0.20 | 93.57 ± 0.05 |
103+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp128 | 65.72 ± 0.14 | 103.87 ± 0.43 | 127.90 ± 0.08 |
104+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp192 | 67.66 ± 0.16 | 121.43 ± 0.18 | 143.60 ± 0.22 |
105+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp256 | 68.45 ± 0.13 | 130.03 ± 0.24 | 156.00 ± 0.32 |
106+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp384 | 67.64 ± 0.08 | 142.89 ± 0.07 | 154.52 ± 0.31 |
107+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp512 | 67.02 ± 0.05 | 136.18 ± 0.06 | 156.22 ± 0.15 |
108+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | pp768 | 66.74 ± 0.03 | 130.78 ± 0.13 | 151.59 ± 0.67 |
109+ | llama 12B BF16 | 22.81 GiB | 12.25 B | bf16 | bf16 | tg16 | 2.52 ± 0.00 | 2.76 ± 0.00 | 3.16 ± 0.00 |
91110
92- ## Mistral-Small-24B-Instruct-2501/BF16.gguf
93- ### ref CPU:
94-
95- | model | size | params | backend | threads | type_kv | test | t/s |
96- | ----------------- | ---------: | ------: | ------- | ------: | ------: | -----: | ------------: |
97- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp1 | 0.92 ± 0.00 |
98- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp2 | 1.81 ± 0.01 |
99- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp4 | 3.61 ± 0.01 |
100- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp8 | 7.16 ± 0.03 |
101- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp16 | 13.38 ± 0.02 |
102- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp32 | 21.26 ± 0.31 |
103- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp64 | 26.09 ± 0.10 |
104- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp128 | 29.76 ± 0.03 |
105- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp256 | 30.09 ± 0.01 |
106- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp384 | 30.72 ± 0.01 |
107- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | pp512 | 30.42 ± 0.15 |
108- | Mistral-Small-24B | 43.91 GiB | 23.57 B | CPU | 8 | bf16 | tg16 | 0.92 ± 0.00 |
109-
110- ### IGPU:
111-
112- | model | size | params | backend | ngl | type_kv | test | t/s |
113- | ----------------- | ---------: | ------: | ------- | --: | ------: | -----: | ------------: |
114- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp1 | 1.27 ± 0.22 |
115- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp2 | 2.73 ± 0.00 |
116- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp4 | 5.44 ± 0.01 |
117- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp8 | 10.78 ± 0.01 |
118- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp16 | 21.08 ± 0.01 |
119- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp32 | 34.35 ± 0.01 |
120- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp64 | 47.33 ± 0.08 |
121- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp128 | 51.69 ± 0.18 |
122- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp256 | 61.49 ± 0.25 |
123- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp384 | 74.41 ± 0.14 |
124- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | pp512 | 68.73 ± 3.85 |
125- | Mistral-Small-24B | 43.91 GiB | 23.57 B | IGPU | 999 | bf16 | tg16 | 1.37 ± 0.00 |
126111
112+ ## Mistral-Small-24B-Instruct-2501/BF16.gguf
113+ | model | size | params | type_k | type_v | test | CPU t/s | V1 t/s | V2 t/s |
114+ | ---------------- | ---------: | -------: | -----: | -----: | ------: | ------------: | ------------: | ------------: |
115+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp1 | 1.28 ± 0.00 | 1.39 ± 0.00 | 1.64 ± 0.00 |
116+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp2 | 2.52 ± 0.01 | 2.76 ± 0.00 | 2.71 ± 0.00 |
117+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp4 | 5.02 ± 0.01 | 5.50 ± 0.01 | 5.26 ± 0.01 |
118+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp8 | 9.87 ± 0.02 | 10.89 ± 0.02 | 9.94 ± 0.01 |
119+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp16 | 18.32 ± 0.07 | 21.32 ± 0.03 | 17.86 ± 0.02 |
120+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp32 | 25.53 ± 0.24 | 34.65 ± 0.02 | 31.50 ± 0.03 |
121+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp48 | 24.53 ± 0.30 | 36.05 ± 0.02 | 43.93 ± 0.02 |
122+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp64 | 25.88 ± 0.13 | 47.87 ± 0.16 | 53.96 ± 0.06 |
123+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp128 | 29.69 ± 0.07 | 52.03 ± 0.23 | 69.64 ± 0.06 |
124+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp192 | 29.99 ± 0.05 | 61.00 ± 0.18 | 79.73 ± 0.19 |
125+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp256 | 30.94 ± 0.02 | 63.11 ± 0.29 | 87.30 ± 0.27 |
126+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp384 | 32.51 ± 0.01 | 75.00 ± 0.25 | 86.26 ± 0.18 |
127+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp512 | 32.28 ± 0.01 | 71.11 ± 0.18 | 88.11 ± 0.13 |
128+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | pp768 | 32.02 ± 0.09 | 67.33 ± 0.13 | 85.47 ± 0.09 |
129+ | llama 24B BF16 | 43.91 GiB | 23.57 B | bf16 | bf16 | tg16 | 1.28 ± 0.00 | 1.38 ± 0.00 | 1.62 ± 0.00 |
127130
128131-------------------------------
129132
0 commit comments