-
Notifications
You must be signed in to change notification settings - Fork 443
Description
Git commit
Operating System & Version
Windows 10
GGML backends
HIP, Vulkan
Command-line arguments used
.\buildhip\bin\sd.exe -m ..\ComfyUI\models\checkpoints\sdxl\dreamshaperXL_v21TurboDPMSDE_q8_0.gguf -p "Hello lora:sdxl/1970ChevelleSS_SDXL_v01:1" --lora-model-dir ..\ComfyUI\models\loras\ -v --lora-apply-mode auto
Steps to reproduce
Run a SDXL model with a lora without setting --lora-apply-mode to immediately
What you expected to happen
No crash, at_runtime mode was supposed to prevent crashes
What actually happened
-
On some (but not all) SDXL models, I get
ggml/src/ggml.c:1925: GGML_ASSERT(ggml_can_repeat(b, a)) failed(ggml_add) during prompt processing regardless of quant (even on original f16 .safetensors with at_runtime mode) -
Other models crash with
ggml/src/ggml.c:4278: GGML_ASSERT(a->ne[2] == b->ne[2]) failed(ggml_im2col) during sampling when quantized (tested q8_0 and q4_0).
Does not seem to happen with SD1.x models, surprisingly, but it seems to consistently happen on SDXL, I tried multiple LoRAs and always got this result.
Logs / error messages / stack trace
[DEBUG] stable-diffusion.cpp:155 - Using CUDA backend
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:69 - ggml_cuda_init: found 1 ROCm devices:
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:69 - Device 0: AMD Radeon RX 6800, gfx1030 (0x1030), VMM: no, Wave Size: 32
[INFO ] stable-diffusion.cpp:212 - loading model from '..\ComfyUI\models\checkpoints\sdxl\dreamshaperXL_v21TurboDPMSDE_q8_0.gguf'
[INFO ] model.cpp:376 - load ..\ComfyUI\models\checkpoints\sdxl\dreamshaperXL_v21TurboDPMSDE_q8_0.gguf using gguf format
[DEBUG] model.cpp:418 - init from '..\ComfyUI\models\checkpoints\sdxl\dreamshaperXL_v21TurboDPMSDE_q8_0.gguf'
[INFO ] stable-diffusion.cpp:303 - Version: SDXL
[INFO ] stable-diffusion.cpp:330 - Weight type stat: f16: 151 | q8_0: 2492
[INFO ] stable-diffusion.cpp:331 - Conditioner weight type stat: f16: 1 | q8_0: 714
[INFO ] stable-diffusion.cpp:332 - Diffusion model weight type stat: f16: 74 | q8_0: 1606
[INFO ] stable-diffusion.cpp:333 - VAE weight type stat: f16: 76 | q8_0: 172
[DEBUG] stable-diffusion.cpp:335 - ggml tensor size = 400 bytes
[DEBUG] stable-diffusion.cpp\clip.hpp:171 - vocab size: 49408
[DEBUG] stable-diffusion.cpp\clip.hpp:182 - trigger word img already in vocab
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1842 - clip params backend buffer size = 125.22 MB(VRAM) (196 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1842 - clip params backend buffer size = 710.31 MB(VRAM) (517 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1842 - unet params backend buffer size = 2925.36 MB(VRAM) (1680 tensors)
[WARN ] stable-diffusion.cpp:544 - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag set, using Conv2D scale 0.031
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1842 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:636 - loading weights
[DEBUG] model.cpp:1297 - using 12 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from ..\ComfyUI\models\checkpoints\sdxl\dreamshaperXL_v21TurboDPMSDE_q8_0.gguf
[INFO ] [INFO ] model.cpp:1554 - unknown tensor 'cond_stage_model.text_projection | q8_0 | 2 [768, 768, 1, 1, 1]' in model file
model.cpp:1554 - unknown tensor 'cond_stage_model.text_projection | q8_0 | 2 [768, 768, 1, 1, 1]' in model file
|==================================================| 2643/2643 - 1824.02it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 1.46s (process: 0.00s, read: 1.15s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.14s)
[INFO ] stable-diffusion.cpp:735 - total params memory size = 3855.36MB (VRAM 3855.36MB, RAM 0.00MB): text_encoders 835.53MB(VRAM), diffusion_model 2925.36MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:834 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:846 - finished loaded file
[DEBUG] stable-diffusion.cpp:2912 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3048 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1142 - lora sdxl/1970ChevelleSS_SDXL_v01:1.00
[INFO ] stable-diffusion.cpp:1149 - apply at runtime
[INFO ] model.cpp:379 - load ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors using safetensors format
[DEBUG] model.cpp:509 - init from '..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors', prefix = 'lora.'
[INFO ] stable-diffusion.cpp\lora.hpp:40 - loading LoRA from '..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors'
[DEBUG] model.cpp:1297 - using 12 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors
|==================================================| 2958/2958 - 13887.32it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 0.22s (process: 0.00s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1842 - lora params backend buffer size = 18.97 MB(VRAM) (792 tensors)
[DEBUG] model.cpp:1297 - using 12 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors
|==================================================| 2958/2958 - 14643.56it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 0.22s (process: 0.01s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.06s)
[DEBUG] stable-diffusion.cpp\lora.hpp:93 - finished loaded lora
[INFO ] model.cpp:379 - load ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors using safetensors format
[DEBUG] model.cpp:509 - init from '..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors', prefix = 'lora.'
[INFO ] stable-diffusion.cpp\lora.hpp:40 - loading LoRA from '..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors'
[DEBUG] model.cpp:1297 - using 12 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors
|==================================================| 2958/2958 - 13887.32it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 0.26s (process: 0.05s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1842 - lora params backend buffer size = 79.33 MB(VRAM) (2166 tensors)
[DEBUG] model.cpp:1297 - using 12 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors
|==================================================| 2958/2958 - 14359.22it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 0.21s (process: 0.01s, read: 0.01s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.05s)
[DEBUG] stable-diffusion.cpp\lora.hpp:93 - finished loaded lora
[INFO ] model.cpp:379 - load ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors using safetensors format
[DEBUG] model.cpp:509 - init from '..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors', prefix = 'lora.'
[INFO ] stable-diffusion.cpp\lora.hpp:40 - loading LoRA from '..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors'
[DEBUG] model.cpp:1297 - using 12 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from ..\ComfyUI\models\loras\sdxl/1970ChevelleSS_SDXL_v01.safetensors
|==================================================| 2958/2958 - 14643.56it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 0.21s (process: 0.00s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:1154 - apply_loras completed, taking 1.30s
[DEBUG] stable-diffusion.cpp:1155 - prompt after extract and remove lora: "Hello "
[DEBUG] stable-diffusion.cpp\conditioner.hpp:358 - parse 'Hello ' to [['Hello ', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 2.26 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 3.76 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 3.76 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:492 - computing condition graph completed, taking 316 ms
[DEBUG] stable-diffusion.cpp\conditioner.hpp:358 - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 2.26 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 3.76 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 3.76 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:492 - computing condition graph completed, taking 135 ms
[INFO ] stable-diffusion.cpp:2694 - get_learned_condition completed, taking 1755 ms
[INFO ] stable-diffusion.cpp:2712 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2806 - generating image: 1/1 - seed 42
H:/stable-diffusion.cpp/ggml/src/ggml.c:4278: GGML_ASSERT(a->ne[2] == b->ne[2]) failed
Stack trace is different for models that crash during prompt processing:
[...]
[INFO ] model.cpp:1528 - loading tensors completed, taking 0.21s (process: 0.00s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:1154 - apply_loras completed, taking 1.25s
[DEBUG] stable-diffusion.cpp:1155 - prompt after extract and remove lora: "Hello "
[DEBUG] stable-diffusion.cpp\conditioner.hpp:358 - parse 'Hello ' to [['Hello ', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1656 - clip compute buffer size: 2.26 MB(VRAM)
H:/stable-diffusion.cpp/ggml/src/ggml.c:1925: GGML_ASSERT(ggml_can_repeat(b, a)) failed