-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Name and Version
version: 4529 (12c2bdf) built with MSVC 19.29.30157.0 for
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server.exe -c 1024 -m index-1.9B-Q8_0.gguf --verboseProblem description & steps to reproduce
server crashes at attempt to use a model with a not anymore supported template. fallback to old mechanism is broken. after the command below it just crashes. before the recent changes with jinja everything worked fine with this model.
even if some parts of the template are not supported, fallback mechanism should still work. right now it's just crashing.
the tempalte in the model is this fun thing, altho i believe it's not really important as it's not about supporting this thing but using the old way to avoid crashing.
{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ unk_token + system_message }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ 'reserved_0' + content.strip() + 'reserved_1' }}{% elif message['role'] == 'assistant' %}{{ content.strip() }}{% endif %}{% endfor %}
First Bad Commit
it didn't happen in version b4404, don't know when jinja stuff was introduced
Relevant log output
llama-server.exe -c 1024 -m index-1.9B-Q8_0.gguf --verbose
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce GT 1030, compute capability 6.1, VMM: yes
Device 1: NVIDIA GeForce GT 1030, compute capability 6.1, VMM: yes
build: 4529 (12c2bdf2) with MSVC 19.29.30157.0 for
system info: n_threads = 6, n_threads_batch = 6, total_threads = 12
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 520,610,700,750 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 11
main: loading model
srv load_model: loading model 'index-1.9B-Q8_0.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GT 1030) - 1640 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce GT 1030) - 1640 MiB free
llama_model_loader: loaded meta data with 25 key-value pairs and 327 tensors from index-1.9B-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Index-1.9B-Character_test
llama_model_loader: - kv 2: llama.block_count u32 = 36
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 2048
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5888
llama_model_loader: - kv 6: llama.attention.head_count u32 = 16
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 16
llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 9: general.file_type u32 = 7
llama_model_loader: - kv 10: llama.vocab_size u32 = 65029
llama_model_loader: - kv 11: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 12: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.pre str = default
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,65029] = ["<unk>", "<s>", "</s>", "reserved_0"...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,65029] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,65029] = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'...
llama_model_loader: - kv 24: general.quantization_version u32 = 2
llama_model_loader: - type f32: 73 tensors
llama_model_loader: - type q8_0: 254 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 2.15 GiB (8.50 BPW)
init_tokenizer: initializing tokenizer for type 1
load: control token: 2 '</s>' is not marked as EOG
load: control token: 37 'reserved_34' is not marked as EOG
load: control token: 42 'reserved_39' is not marked as EOG
load: control token: 3 'reserved_0' is not marked as EOG
load: control token: 43 'reserved_40' is not marked as EOG
load: control token: 16 'reserved_13' is not marked as EOG
load: control token: 112 'reserved_109' is not marked as EOG
load: control token: 1 '<s>' is not marked as EOG
load: control token: 41 'reserved_38' is not marked as EOG
load: control token: 4 'reserved_1' is not marked as EOG
load: control token: 5 'reserved_2' is not marked as EOG
load: control token: 121 'reserved_118' is not marked as EOG
load: control token: 102 'reserved_99' is not marked as EOG
load: control token: 6 'reserved_3' is not marked as EOG
load: control token: 122 'reserved_119' is not marked as EOG
load: control token: 101 'reserved_98' is not marked as EOG
load: control token: 7 'reserved_4' is not marked as EOG
load: control token: 8 'reserved_5' is not marked as EOG
load: control token: 9 'reserved_6' is not marked as EOG
load: control token: 10 'reserved_7' is not marked as EOG
load: control token: 34 'reserved_31' is not marked as EOG
load: control token: 11 'reserved_8' is not marked as EOG
load: control token: 115 'reserved_112' is not marked as EOG
load: control token: 96 'reserved_93' is not marked as EOG
load: control token: 33 'reserved_30' is not marked as EOG
load: control token: 12 'reserved_9' is not marked as EOG
load: control token: 116 'reserved_113' is not marked as EOG
load: control token: 95 'reserved_92' is not marked as EOG
load: control token: 46 'reserved_43' is not marked as EOG
load: control token: 13 'reserved_10' is not marked as EOG
load: control token: 45 'reserved_42' is not marked as EOG
load: control token: 14 'reserved_11' is not marked as EOG
load: control token: 44 'reserved_41' is not marked as EOG
load: control token: 15 'reserved_12' is not marked as EOG
load: control token: 50 'reserved_47' is not marked as EOG
load: control token: 17 'reserved_14' is not marked as EOG
load: control token: 49 'reserved_46' is not marked as EOG
load: control token: 18 'reserved_15' is not marked as EOG
load: control token: 48 'reserved_45' is not marked as EOG
load: control token: 19 'reserved_16' is not marked as EOG
load: control token: 47 'reserved_44' is not marked as EOG
load: control token: 20 'reserved_17' is not marked as EOG
load: control token: 21 'reserved_18' is not marked as EOG
load: control token: 22 'reserved_19' is not marked as EOG
load: control token: 23 'reserved_20' is not marked as EOG
load: control token: 24 'reserved_21' is not marked as EOG
load: control token: 192 'reserved_189' is not marked as EOG
load: control token: 25 'reserved_22' is not marked as EOG
load: control token: 26 'reserved_23' is not marked as EOG
load: control token: 191 'reserved_188' is not marked as EOG
load: control token: 27 'reserved_24' is not marked as EOG
load: control token: 28 'reserved_25' is not marked as EOG
load: control token: 29 'reserved_26' is not marked as EOG
load: control token: 30 'reserved_27' is not marked as EOG
load: control token: 186 'reserved_183' is not marked as EOG
load: control token: 31 'reserved_28' is not marked as EOG
load: control token: 185 'reserved_182' is not marked as EOG
load: control token: 32 'reserved_29' is not marked as EOG
load: control token: 35 'reserved_32' is not marked as EOG
load: control token: 36 'reserved_33' is not marked as EOG
load: control token: 38 'reserved_35' is not marked as EOG
load: control token: 39 'reserved_36' is not marked as EOG
load: control token: 40 'reserved_37' is not marked as EOG
load: control token: 51 'reserved_48' is not marked as EOG
load: control token: 52 'reserved_49' is not marked as EOG
load: control token: 53 'reserved_50' is not marked as EOG
load: control token: 54 'reserved_51' is not marked as EOG
load: control token: 55 'reserved_52' is not marked as EOG
load: control token: 56 'reserved_53' is not marked as EOG
load: control token: 57 'reserved_54' is not marked as EOG
load: control token: 202 'reserved_199' is not marked as EOG
load: control token: 58 'reserved_55' is not marked as EOG
load: control token: 201 'reserved_198' is not marked as EOG
load: control token: 59 'reserved_56' is not marked as EOG
load: control token: 60 'reserved_57' is not marked as EOG
load: control token: 61 'reserved_58' is not marked as EOG
load: control token: 198 'reserved_195' is not marked as EOG
load: control token: 62 'reserved_59' is not marked as EOG
load: control token: 197 'reserved_194' is not marked as EOG
load: control token: 63 'reserved_60' is not marked as EOG
load: control token: 64 'reserved_61' is not marked as EOG
load: control token: 65 'reserved_62' is not marked as EOG
load: control token: 66 'reserved_63' is not marked as EOG
load: control token: 67 'reserved_64' is not marked as EOG
load: control token: 242 'reserved_239' is not marked as EOG
load: control token: 68 'reserved_65' is not marked as EOG
load: control token: 241 'reserved_238' is not marked as EOG
load: control token: 69 'reserved_66' is not marked as EOG
load: control token: 70 'reserved_67' is not marked as EOG
load: control token: 71 'reserved_68' is not marked as EOG
load: control token: 238 'reserved_235' is not marked as EOG
load: control token: 72 'reserved_69' is not marked as EOG
load: control token: 237 'reserved_234' is not marked as EOG
load: control token: 73 'reserved_70' is not marked as EOG
load: control token: 74 'reserved_71' is not marked as EOG
load: control token: 75 'reserved_72' is not marked as EOG
load: control token: 76 'reserved_73' is not marked as EOG
load: control token: 77 'reserved_74' is not marked as EOG
load: control token: 78 'reserved_75' is not marked as EOG
load: control token: 79 'reserved_76' is not marked as EOG
load: control token: 80 'reserved_77' is not marked as EOG
load: control token: 81 'reserved_78' is not marked as EOG
load: control token: 82 'reserved_79' is not marked as EOG
load: control token: 166 'reserved_163' is not marked as EOG
load: control token: 83 'reserved_80' is not marked as EOG
load: control token: 165 'reserved_162' is not marked as EOG
load: control token: 84 'reserved_81' is not marked as EOG
load: control token: 85 'reserved_82' is not marked as EOG
load: control token: 164 'reserved_161' is not marked as EOG
load: control token: 163 'reserved_160' is not marked as EOG
load: control token: 86 'reserved_83' is not marked as EOG
load: control token: 170 'reserved_167' is not marked as EOG
load: control token: 87 'reserved_84' is not marked as EOG
load: control token: 169 'reserved_166' is not marked as EOG
load: control token: 88 'reserved_85' is not marked as EOG
load: control token: 168 'reserved_165' is not marked as EOG
load: control token: 89 'reserved_86' is not marked as EOG
load: control token: 167 'reserved_164' is not marked as EOG
load: control token: 90 'reserved_87' is not marked as EOG
load: control token: 91 'reserved_88' is not marked as EOG
load: control token: 92 'reserved_89' is not marked as EOG
load: control token: 114 'reserved_111' is not marked as EOG
load: control token: 93 'reserved_90' is not marked as EOG
load: control token: 113 'reserved_110' is not marked as EOG
load: control token: 94 'reserved_91' is not marked as EOG
load: control token: 118 'reserved_115' is not marked as EOG
load: control token: 97 'reserved_94' is not marked as EOG
load: control token: 117 'reserved_114' is not marked as EOG
load: control token: 98 'reserved_95' is not marked as EOG
load: control token: 120 'reserved_117' is not marked as EOG
load: control token: 99 'reserved_96' is not marked as EOG
load: control token: 119 'reserved_116' is not marked as EOG
load: control token: 100 'reserved_97' is not marked as EOG
load: control token: 103 'reserved_100' is not marked as EOG
load: control token: 104 'reserved_101' is not marked as EOG
load: control token: 105 'reserved_102' is not marked as EOG
load: control token: 106 'reserved_103' is not marked as EOG
load: control token: 107 'reserved_104' is not marked as EOG
load: control token: 108 'reserved_105' is not marked as EOG
load: control token: 109 'reserved_106' is not marked as EOG
load: control token: 110 'reserved_107' is not marked as EOG
load: control token: 111 'reserved_108' is not marked as EOG
load: control token: 123 'reserved_120' is not marked as EOG
load: control token: 124 'reserved_121' is not marked as EOG
load: control token: 125 'reserved_122' is not marked as EOG
load: control token: 126 'reserved_123' is not marked as EOG
load: control token: 127 'reserved_124' is not marked as EOG
load: control token: 128 'reserved_125' is not marked as EOG
load: control token: 129 'reserved_126' is not marked as EOG
load: control token: 130 'reserved_127' is not marked as EOG
load: control token: 131 'reserved_128' is not marked as EOG
load: control token: 132 'reserved_129' is not marked as EOG
load: control token: 133 'reserved_130' is not marked as EOG
load: control token: 134 'reserved_131' is not marked as EOG
load: control token: 135 'reserved_132' is not marked as EOG
load: control token: 136 'reserved_133' is not marked as EOG
load: control token: 137 'reserved_134' is not marked as EOG
load: control token: 138 'reserved_135' is not marked as EOG
load: control token: 139 'reserved_136' is not marked as EOG
load: control token: 140 'reserved_137' is not marked as EOG
load: control token: 141 'reserved_138' is not marked as EOG
load: control token: 142 'reserved_139' is not marked as EOG
load: control token: 143 'reserved_140' is not marked as EOG
load: control token: 144 'reserved_141' is not marked as EOG
load: control token: 145 'reserved_142' is not marked as EOG
load: control token: 146 'reserved_143' is not marked as EOG
load: control token: 147 'reserved_144' is not marked as EOG
load: control token: 148 'reserved_145' is not marked as EOG
load: control token: 149 'reserved_146' is not marked as EOG
load: control token: 150 'reserved_147' is not marked as EOG
load: control token: 151 'reserved_148' is not marked as EOG
load: control token: 152 'reserved_149' is not marked as EOG
load: control token: 153 'reserved_150' is not marked as EOG
load: control token: 154 'reserved_151' is not marked as EOG
load: control token: 155 'reserved_152' is not marked as EOG
load: control token: 156 'reserved_153' is not marked as EOG
load: control token: 157 'reserved_154' is not marked as EOG
load: control token: 158 'reserved_155' is not marked as EOG
load: control token: 159 'reserved_156' is not marked as EOG
load: control token: 160 'reserved_157' is not marked as EOG
load: control token: 161 'reserved_158' is not marked as EOG
load: control token: 162 'reserved_159' is not marked as EOG
load: control token: 171 'reserved_168' is not marked as EOG
load: control token: 172 'reserved_169' is not marked as EOG
load: control token: 173 'reserved_170' is not marked as EOG
load: control token: 174 'reserved_171' is not marked as EOG
load: control token: 175 'reserved_172' is not marked as EOG
load: control token: 176 'reserved_173' is not marked as EOG
load: control token: 177 'reserved_174' is not marked as EOG
load: control token: 178 'reserved_175' is not marked as EOG
load: control token: 179 'reserved_176' is not marked as EOG
load: control token: 180 'reserved_177' is not marked as EOG
load: control token: 181 'reserved_178' is not marked as EOG
load: control token: 182 'reserved_179' is not marked as EOG
load: control token: 183 'reserved_180' is not marked as EOG
load: control token: 184 'reserved_181' is not marked as EOG
load: control token: 187 'reserved_184' is not marked as EOG
load: control token: 188 'reserved_185' is not marked as EOG
load: control token: 189 'reserved_186' is not marked as EOG
load: control token: 190 'reserved_187' is not marked as EOG
load: control token: 193 'reserved_190' is not marked as EOG
load: control token: 194 'reserved_191' is not marked as EOG
load: control token: 195 'reserved_192' is not marked as EOG
load: control token: 196 'reserved_193' is not marked as EOG
load: control token: 199 'reserved_196' is not marked as EOG
load: control token: 200 'reserved_197' is not marked as EOG
load: control token: 203 'reserved_200' is not marked as EOG
load: control token: 204 'reserved_201' is not marked as EOG
load: control token: 205 'reserved_202' is not marked as EOG
load: control token: 206 'reserved_203' is not marked as EOG
load: control token: 207 'reserved_204' is not marked as EOG
load: control token: 208 'reserved_205' is not marked as EOG
load: control token: 209 'reserved_206' is not marked as EOG
load: control token: 210 'reserved_207' is not marked as EOG
load: control token: 211 'reserved_208' is not marked as EOG
load: control token: 212 'reserved_209' is not marked as EOG
load: control token: 213 'reserved_210' is not marked as EOG
load: control token: 214 'reserved_211' is not marked as EOG
load: control token: 215 'reserved_212' is not marked as EOG
load: control token: 216 'reserved_213' is not marked as EOG
load: control token: 217 'reserved_214' is not marked as EOG
load: control token: 218 'reserved_215' is not marked as EOG
load: control token: 219 'reserved_216' is not marked as EOG
load: control token: 220 'reserved_217' is not marked as EOG
load: control token: 221 'reserved_218' is not marked as EOG
load: control token: 222 'reserved_219' is not marked as EOG
load: control token: 223 'reserved_220' is not marked as EOG
load: control token: 224 'reserved_221' is not marked as EOG
load: control token: 225 'reserved_222' is not marked as EOG
load: control token: 226 'reserved_223' is not marked as EOG
load: control token: 227 'reserved_224' is not marked as EOG
load: control token: 228 'reserved_225' is not marked as EOG
load: control token: 229 'reserved_226' is not marked as EOG
load: control token: 230 'reserved_227' is not marked as EOG
load: control token: 231 'reserved_228' is not marked as EOG
load: control token: 232 'reserved_229' is not marked as EOG
load: control token: 233 'reserved_230' is not marked as EOG
load: control token: 234 'reserved_231' is not marked as EOG
load: control token: 235 'reserved_232' is not marked as EOG
load: control token: 236 'reserved_233' is not marked as EOG
load: control token: 239 'reserved_236' is not marked as EOG
load: control token: 240 'reserved_237' is not marked as EOG
load: control token: 243 'reserved_240' is not marked as EOG
load: control token: 244 'reserved_241' is not marked as EOG
load: control token: 245 'reserved_242' is not marked as EOG
load: control token: 246 'reserved_243' is not marked as EOG
load: control token: 247 'reserved_244' is not marked as EOG
load: control token: 248 'reserved_245' is not marked as EOG
load: control token: 249 'reserved_246' is not marked as EOG
load: control token: 250 'reserved_247' is not marked as EOG
load: control token: 251 'reserved_248' is not marked as EOG
load: control token: 252 'reserved_249' is not marked as EOG
load: control token: 253 'reserved_250' is not marked as EOG
load: control token: 254 'reserved_251' is not marked as EOG
load: control token: 255 'reserved_252' is not marked as EOG
load: control token: 256 'reserved_253' is not marked as EOG
load: control token: 257 'reserved_254' is not marked as EOG
load: control token: 258 'reserved_255' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 259
load: token to piece cache size = 0.3670 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 4096
print_info: n_embd = 2048
print_info: n_layer = 36
print_info: n_head = 16
print_info: n_head_kv = 16
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 1
print_info: n_embd_k_gqa = 2048
print_info: n_embd_v_gqa = 2048
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 5888
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 4096
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 8B
print_info: model params = 2.17 B
print_info: general.name = Index-1.9B-Character_test
print_info: vocab type = SPM
print_info: n_vocab = 65029
print_info: n_merges = 0
print_info: BOS token = 1 '<s>'
print_info: EOS token = 2 '</s>'
print_info: UNK token = 0 '<unk>'
print_info: PAD token = 0 '<unk>'
print_info: LF token = 270 '<0x0A>'
print_info: EOG token = 2 '</s>'
print_info: max token length = 48
load_tensors: tensor 'token_embd.weight' (q8_0) (and 326 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/37 layers to GPU
load_tensors: CPU_Mapped model buffer size = 2202.09 MiB
llama_init_from_model: n_seq_max = 1
llama_init_from_model: n_ctx = 1024
llama_init_from_model: n_ctx_per_seq = 1024
llama_init_from_model: n_batch = 1024
llama_init_from_model: n_ubatch = 512
llama_init_from_model: flash_attn = 0
llama_init_from_model: freq_base = 10000.0
llama_init_from_model: freq_scale = 1
llama_init_from_model: n_ctx_per_seq (1024) < n_ctx_train (4096) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 1024, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 36, can_shift = 1
llama_kv_cache_init: layer 0: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 1: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 2: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 3: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 4: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 5: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 6: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 7: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 8: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 9: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 10: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 11: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 12: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 13: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 14: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 15: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 16: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 17: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 18: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 19: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 20: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 21: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 22: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 23: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 24: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 25: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 26: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 27: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 28: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 29: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 30: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 31: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 32: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 33: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 34: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: layer 35: n_embd_k_gqa = 2048, n_embd_v_gqa = 2048
llama_kv_cache_init: CPU KV buffer size = 288.00 MiB
llama_init_from_model: KV self size = 288.00 MiB, K (f16): 144.00 MiB, V (f16): 144.00 MiB
llama_init_from_model: CPU output buffer size = 0.25 MiB
llama_init_from_model: CUDA0 compute buffer size = 265.96 MiB
llama_init_from_model: CUDA_Host compute buffer size = 10.01 MiB
llama_init_from_model: graph nodes = 1158
llama_init_from_model: graph splits = 400 (with bs=512), 1 (with bs=1)
common_init_from_params: setting dry_penalty_last_n to ctx_size = 1024
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv init: initializing slots, n_slots = 1
slot init: id 0 | task -1 | new slot n_ctx_slot = 1024
slot reset: id 0 | task -1 |
main: model loaded
main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
Close code: 3221226505