Skip to content

Conversation

@DN6
Copy link
Collaborator

@DN6 DN6 commented Jul 16, 2024

What does this PR do?

We're experiencing some issues reading/writing to the mounted cache. In this PR we

  1. Remove the use of the mounted cache in favour of using HF Transfer and downloading the models to the default cache inside the container for every job. This won't provide too much of a slow down on tests as we tend to use just a few models across multiple slow tests. e.g. Runway's SD 1.5 is used in almost all SD slow tests. So only a few downloads will happen per job. Additionally, reading/writing from the default cache inside the container is much faster that using the mounted cache. So we should see some speed ups in load times for pipelines.

  2. Move all our slow tests with checkpoints to the nightly tests. We usually only consider the latest slow tests when identifying errors. Therefore we don't necessarily need to run checkpoint tests on every merge. It's also a bit more practical/actionable since we will get only a single set of notifications per day related to test failures.

  3. Only run Fast/Fast GPU tests on merge. This will speed up the merge tests quite significantly.

  4. Move log_reports.py script into the utils folder so it lives with our other CI utils.

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

matrix:
module: [models, schedulers, others, examples]
max-parallel: 2
module: [models, schedulers, lora, others, single_file, examples]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Lora tests here.

pip install slack_sdk tabulate
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
run_lora_nightly_tests:
Copy link
Collaborator Author

@DN6 DN6 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run LoRA tests in the Nightly Torch CUDA Tests job since PEFT is a needed dependency for LoRA loading. We don't need a job dedicated PEFT job anymore. LoRA Tests == PEFT Tests basically.

name: torch_cuda_test_reports
path: reports

peft_cuda_tests:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed as LoRA Tests require PEFT. We can just run the LoRA tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LoRA tests are basically PEFT tests, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

@DN6 DN6 requested a review from sayakpaul July 16, 2024 08:28
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions. I am not sure if removing LoRA related tests from push_tests.yml is a good idea though.

python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
python -m uv pip install hf_transfer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have this installed in our Dockerfile installed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoRA tests still run here

module: [models, schedulers, lora, others, single_file]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added installing PEFT from source as well.

name: torch_cuda_test_reports
path: reports

peft_cuda_tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LoRA tests are basically PEFT tests, no?

@DN6
Copy link
Collaborator Author

DN6 commented Jul 17, 2024

Not entirely sure what's happening with the tests here. They pass locally.

@DN6 DN6 merged commit 5fbb4d3 into main Jul 25, 2024
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* update

* update

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants