Skip to content

Commit 454769c

Browse files
dg845tolgacangoz
authored andcommitted
Update Wan Animate Docs (#12658)
* Update the Wan Animate docs to reflect the most recent code * Further explain input preprocessing and link to original Wan Animate preprocessing scripts
1 parent a5cff8b commit 454769c

File tree

2 files changed

+19
-31
lines changed

2 files changed

+19
-31
lines changed

docs/source/en/api/models/wan_animate_transformer_3d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import WanAnimateTransformer3DModel
2020

21-
transformer = WanAnimateTransformer3DModel.from_pretrained("Wan-AI/Wan2.2-Animate-14B-720P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
21+
transformer = WanAnimateTransformer3DModel.from_pretrained("Wan-AI/Wan2.2-Animate-14B-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
2222
```
2323

2424
## WanAnimateTransformer3DModel

docs/source/en/api/pipelines/wan.md

Lines changed: 18 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -405,7 +405,7 @@ For replacement mode, you additionally need:
405405
- **Mask video**: A mask indicating where to generate content (white) vs. preserve original (black)
406406

407407
> [!NOTE]
408-
> The preprocessing tools are available in the original Wan-Animate repository. Integration of these preprocessing steps into Diffusers is planned for a future release.
408+
> Raw videos should not be used for inputs such as `pose_video`, which the pipeline expects to be preprocessed to extract the proper information. Preprocessing scripts to prepare these inputs are available in the [original Wan-Animate repository](https://github.com/Wan-Video/Wan2.2?tab=readme-ov-file#1-preprocessing). Integration of these preprocessing steps into Diffusers is planned for a future release.
409409
410410
The example below demonstrates how to use the Wan-Animate pipeline:
411411

@@ -417,13 +417,10 @@ import numpy as np
417417
import torch
418418
from diffusers import AutoencoderKLWan, WanAnimatePipeline
419419
from diffusers.utils import export_to_video, load_image, load_video
420-
from transformers import CLIPVisionModel
421420

422421
model_id = "Wan-AI/Wan2.2-Animate-14B-Diffusers"
423422
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
424-
pipe = WanAnimatePipeline.from_pretrained(
425-
model_id, vae=vae, torch_dtype=torch.bfloat16
426-
)
423+
pipe = WanAnimatePipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
427424
pipe.to("cuda")
428425

429426
# Load character image and preprocessed videos
@@ -454,11 +451,11 @@ output = pipe(
454451
negative_prompt=negative_prompt,
455452
height=height,
456453
width=width,
457-
num_frames=81,
458-
guidance_scale=5.0,
459-
mode="animation", # Animation mode (default)
454+
segment_frame_length=77,
455+
guidance_scale=1.0,
456+
mode="animate", # Animation mode (default)
460457
).frames[0]
461-
export_to_video(output, "animated_character.mp4", fps=16)
458+
export_to_video(output, "animated_character.mp4", fps=30)
462459
```
463460

464461
</hfoption>
@@ -469,14 +466,10 @@ import numpy as np
469466
import torch
470467
from diffusers import AutoencoderKLWan, WanAnimatePipeline
471468
from diffusers.utils import export_to_video, load_image, load_video
472-
from transformers import CLIPVisionModel
473469

474470
model_id = "Wan-AI/Wan2.2-Animate-14B-Diffusers"
475-
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float16)
476471
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
477-
pipe = WanAnimatePipeline.from_pretrained(
478-
model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16
479-
)
472+
pipe = WanAnimatePipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
480473
pipe.to("cuda")
481474

482475
# Load all required inputs for replacement mode
@@ -511,11 +504,11 @@ output = pipe(
511504
negative_prompt=negative_prompt,
512505
height=height,
513506
width=width,
514-
num_frames=81,
515-
guidance_scale=5.0,
516-
mode="replacement", # Replacement mode
507+
segment_frame_lengths=77,
508+
guidance_scale=1.0,
509+
mode="replace", # Replacement mode
517510
).frames[0]
518-
export_to_video(output, "character_replaced.mp4", fps=16)
511+
export_to_video(output, "character_replaced.mp4", fps=30)
519512
```
520513

521514
</hfoption>
@@ -526,14 +519,10 @@ import numpy as np
526519
import torch
527520
from diffusers import AutoencoderKLWan, WanAnimatePipeline
528521
from diffusers.utils import export_to_video, load_image, load_video
529-
from transformers import CLIPVisionModel
530522

531523
model_id = "Wan-AI/Wan2.2-Animate-14B-Diffusers"
532-
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float16)
533524
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
534-
pipe = WanAnimatePipeline.from_pretrained(
535-
model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16
536-
)
525+
pipe = WanAnimatePipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
537526
pipe.to("cuda")
538527

539528
image = load_image("path/to/character.jpg")
@@ -567,25 +556,24 @@ output = pipe(
567556
negative_prompt=negative_prompt,
568557
height=height,
569558
width=width,
570-
num_frames=81,
559+
segment_frame_length=77,
571560
num_inference_steps=50,
572561
guidance_scale=5.0,
573-
num_frames_for_temporal_guidance=5, # Use 5 frames for temporal guidance (1 or 5 recommended)
562+
prev_segment_conditioning_frames=5, # Use 5 frames for temporal guidance (1 or 5 recommended)
574563
callback_on_step_end=callback_fn,
575564
callback_on_step_end_tensor_inputs=["latents"],
576565
).frames[0]
577-
export_to_video(output, "animated_advanced.mp4", fps=16)
566+
export_to_video(output, "animated_advanced.mp4", fps=30)
578567
```
579568

580569
</hfoption>
581570
</hfoptions>
582571

583572
#### Key Parameters
584573

585-
- **mode**: Choose between `"animation"` (default) or `"replacement"`
586-
- **num_frames_for_temporal_guidance**: Number of frames for temporal guidance (1 or 5 recommended). Using 5 provides better temporal consistency but requires more memory
587-
- **guidance_scale**: Controls how closely the output follows the text prompt. Higher values (5-7) produce results more aligned with the prompt
588-
- **num_frames**: Total number of frames to generate. Should be divisible by `vae_scale_factor_temporal` (default: 4)
574+
- **mode**: Choose between `"animate"` (default) or `"replace"`
575+
- **prev_segment_conditioning_frames**: Number of frames for temporal guidance (1 or 5 recommended). Using 5 provides better temporal consistency but requires more memory
576+
- **guidance_scale**: Controls how closely the output follows the text prompt. Higher values (5-7) produce results more aligned with the prompt. For Wan-Animate, CFG is disabled by default (`guidance_scale=1.0`) but can be enabled to support negative prompts and finer control over facial expressions. (Note that CFG will only target the text prompt and face conditioning.)
589577

590578

591579
## Notes

0 commit comments

Comments
 (0)