Improve the performance and suitable for NPU computing #9631

leisuzz · 2024-10-10T03:03:13Z

What does this PR do?

Improve the performance (FPS) while training, and suitable for NPU computing.
Selection for free memory for CUDA or NPU
Add FlashAttention for NPU in attention processor

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

a-r-r-o-w · 2024-10-10T22:07:21Z

examples/text_to_image/train_text_to_image_sdxl.py

        model_input = vae.encode(pixel_values).latent_dist.sample()
    model_input = model_input * vae.config.scaling_factor
-    return {"model_input": model_input.cpu()}
+    return {"model_input": accelerator.gather(model_input)}


Sorry if my question is stupid, but why do we need to gather here? Doesn't this cause a sync between all ranks, as opposed to npu to cpu memory sync, making it slower overall?

a-r-r-o-w · 2024-10-10T22:07:28Z

examples/text_to_image/train_text_to_image_sdxl.py

                    add_time_ids = list(original_size + crops_coords_top_left + target_size)
-                    add_time_ids = torch.tensor([add_time_ids])
-                    add_time_ids = add_time_ids.to(accelerator.device, dtype=weight_dtype)
+                    add_time_ids = torch.tensor([add_time_ids], device=accelerator.device, dtype=weight_dtype)


a-r-r-o-w · 2024-10-10T22:08:18Z

src/diffusers/models/attention_processor.py


-        hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
-        hidden_states = hidden_states.to(query.dtype)
+        hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim).to(query.dtype)


Why do we need this change? For improvements to the library, feel free to open a separate PR :)

a-r-r-o-w · 2024-10-10T22:09:30Z

cc @sayakpaul for training scripts and as original author for the sdxl script

sayakpaul · 2024-10-11T08:07:03Z

@leisuzz any reason for closing the PR?

leisuzz · 2024-10-11T08:24:58Z

@sayakpaul Sorry, some conflicts for the commit, I created two new PRs #9642 and #9640

a-r-r-o-w reviewed Oct 10, 2024

View reviewed changes

a-r-r-o-w requested a review from sayakpaul October 10, 2024 22:09

leisuzz force-pushed the main branch from 2f50c6c to 0ef20e7 Compare October 11, 2024 01:57

leisuzz closed this Oct 11, 2024

leisuzz force-pushed the main branch from 0ef20e7 to 38a3e4d Compare October 11, 2024 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve the performance and suitable for NPU computing #9631

Improve the performance and suitable for NPU computing #9631

Uh oh!

leisuzz commented Oct 10, 2024 •

edited

Loading

Uh oh!

a-r-r-o-w Oct 10, 2024

Uh oh!

a-r-r-o-w Oct 10, 2024

Uh oh!

a-r-r-o-w Oct 10, 2024

Uh oh!

a-r-r-o-w commented Oct 10, 2024

Uh oh!

sayakpaul commented Oct 11, 2024

Uh oh!

leisuzz commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve the performance and suitable for NPU computing #9631

Improve the performance and suitable for NPU computing #9631

Uh oh!

Conversation

leisuzz commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

a-r-r-o-w Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented Oct 10, 2024

Uh oh!

sayakpaul commented Oct 11, 2024

Uh oh!

leisuzz commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leisuzz commented Oct 10, 2024 •

edited

Loading