-
Notifications
You must be signed in to change notification settings - Fork 5
[0.6.0-UT] Transferring Multi-GPU tests #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
build/rocm/run_multi_gpu.sh
Outdated
| --reruns 3 \ | ||
| tests/pmap_test.py | ||
| # Multi-GPU test files | ||
| MULTI_GPU_TESTS=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume, this variable must have the same definition in both runners, right?
Then perhaps you could just refactor it to a separate script that both runners will source to get the same variable value?
This would simplify and robustify maintenance of the tests..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, the second runner is a python script actually... That makes things more annoying, but still doable. I fear, definitions could easily go out of sync if left as is. Even now it's hard to validate they are in sync..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on this, @gulsumgudukbay ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, other than the problem of duplicating the list multi-GPU tests like Aleksei mentioned. I'm okay to merge after that's fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, Gulsum, thanks! I learned about mapfile :)
thanks! yep, I learned it new as well, such a nice tool. |
Some tests that require multiple GPUs were executed in the run_single_gpu.py script, which was an issue because this script only dedicates a single GPU per each test.
Therefore, this PR identified the tests that require multiple GPUs and migrated them to run_multi_gpu.sh script and excluded those tests from the run_single_gpu.py script.
After this is merged, the abort support that lives in run_single_gpu.py will be also added to run_multi_gpu.sh script.