-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[CI] Slow Test Updates #8870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Slow Test Updates #8870
Conversation
| matrix: | ||
| module: [models, schedulers, others, examples] | ||
| max-parallel: 2 | ||
| module: [models, schedulers, lora, others, single_file, examples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Lora tests here.
| pip install slack_sdk tabulate | ||
| python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY | ||
| run_lora_nightly_tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run LoRA tests in the Nightly Torch CUDA Tests job since PEFT is a needed dependency for LoRA loading. We don't need a job dedicated PEFT job anymore. LoRA Tests == PEFT Tests basically.
| name: torch_cuda_test_reports | ||
| path: reports | ||
|
|
||
| peft_cuda_tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed as LoRA Tests require PEFT. We can just run the LoRA tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoRA tests are basically PEFT tests, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of suggestions. I am not sure if removing LoRA related tests from push_tests.yml is a good idea though.
.github/workflows/nightly_tests.yml
Outdated
| python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git | ||
| python -m uv pip install pytest-reportlog | ||
| python -m uv pip install hf_transfer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have this installed in our Dockerfile installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoRA tests still run here
diffusers/.github/workflows/push_tests.yml
Line 113 in 96b0e1d
| module: [models, schedulers, lora, others, single_file] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added installing PEFT from source as well.
| name: torch_cuda_test_reports | ||
| path: reports | ||
|
|
||
| peft_cuda_tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoRA tests are basically PEFT tests, no?
|
Not entirely sure what's happening with the tests here. They pass locally. |
What does this PR do?
We're experiencing some issues reading/writing to the mounted cache. In this PR we
Remove the use of the mounted cache in favour of using HF Transfer and downloading the models to the default cache inside the container for every job. This won't provide too much of a slow down on tests as we tend to use just a few models across multiple slow tests. e.g. Runway's SD 1.5 is used in almost all SD slow tests. So only a few downloads will happen per job. Additionally, reading/writing from the default cache inside the container is much faster that using the mounted cache. So we should see some speed ups in load times for pipelines.
Move all our slow tests with checkpoints to the nightly tests. We usually only consider the latest slow tests when identifying errors. Therefore we don't necessarily need to run checkpoint tests on every merge. It's also a bit more practical/actionable since we will get only a single set of notifications per day related to test failures.
Only run Fast/Fast GPU tests on merge. This will speed up the merge tests quite significantly.
Move
log_reports.pyscript into theutilsfolder so it lives with our other CI utils.Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.