Skip to content

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 9, 2025

Pick up the work on #39 to support CPU benchmark. The PR is more involved than I expect, and the list of changes include:

  1. Put vLLM benchmark suite into the appropriate platform folders for cuda, rocm, and cpu
  2. Extend the logic in .github/scripts/generate_vllm_benchmark_matrix.py to read from the correct folder from (1)
  3. Add .github/scripts/test_generate_vllm_benchmark_matrix.py for (2) because it's pretty complex now
  4. Extend the logic .github/scripts/setup_vllm_benchmark.py to copy from the correct folder from (1)
  5. Use our existing linux.24xl.spr-metal to run CPU benchmark until Intel's runner is ready
  6. Incorporate the change from enable CPU benchmark for VLLM Perf Dashboard.  #39 to the workflow:
    1. To use vLLM CPU Docker image at public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:<HEAD_SHA>-cpu
    2. To pass ON_CPU to vLLM benchmark script
  7. Fix the use of torch.cuda.get_device_name() in .github/scripts/upload_benchmark_results.py because there is no CUDA device on CPU

Testing

https://github.com/pytorch/pytorch-integration-testing/actions/runs/16231541112

cc @louie-tsai

@louie-tsai
Copy link
Collaborator

louie-tsai commented Jul 10, 2025

some path issue
image

I saw the serving-test.json file under /workspace/.buildkite/nightly-benchmarks/tests folder.
image

no sure whether we have right path in current workflow.
docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e SCCACHE_SERVER_PORT=5228 -e SCCACHE_BUCKET -e SCCACHE_REGION -e DEVICE_NAME -e DEVICE_TYPE -e HF_TOKEN -e ENGINE_VERSION -e SAVE_TO_PYTORCH_BENCHMARK_FORMAT -e ON_CPU=0 --ipc=host --tty --security-opt seccomp=unconfined -v /home/bob/_work/pytorch-integration-testing/pytorch-integration-testing:/tmp/workspace -w /tmp/workspace public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:b6e7e3d58f57aee30a55b3160645ddb2f029d3c8 bash -xc 'cd vllm-benchmarks/vllm && bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh'

@huydhn
Copy link
Contributor Author

huydhn commented Jul 10, 2025

some path issue

Yeah, the step to setup the benchmark need a tweak per my comment in #39 (comment). When the device is CPU, it looks for file with _cpu suffix and that's fine. However, for CUDA or ROCm device, there is no _cuda or _rocm suffix. This looks like an ez tweak, so I could do it here if you prefer

@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 12, 2025 01:26 — with GitHub Actions Inactive
2: [
"linux.aws.h100.4",
"linux.rocm.gpu.mi300.2",
"linux.24xl.spr-metal",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

24xlarge have only 1 NUMA node, so we should not put it under TP=2

@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm July 13, 2025 08:29 — with GitHub Actions Inactive
@huydhn huydhn merged commit 046a22c into main Jul 15, 2025
11 of 13 checks passed
@huydhn huydhn deleted the 39 branch July 17, 2025 01:04
@fadara01
Copy link
Collaborator

Hi @huydhn - we would like to enable this for AArch64 too (linux.arm64.m7g.metal)
What's the best place to ask for write-access to this repository s.t. once can test the changes?

@huydhn
Copy link
Contributor Author

huydhn commented Sep 25, 2025

Hi @huydhn - we would like to enable this for AArch64 too (linux.arm64.m7g.metal) What's the best place to ask for write-access to this repository s.t. once can test the changes?

I could grant you that permission, but want to check what it is needed for. I thought that submitting a PR like this one would be sufficient? We do have linux.arm64.m7g.metal runner ready to use.

@cfRod
Copy link
Collaborator

cfRod commented Sep 30, 2025

@huydhn I think we mean permissions to trigger the workflow for dashboard?
We have this for TorchInductor HUD dashboard i.e.

@huydhn
Copy link
Contributor Author

huydhn commented Oct 2, 2025

@huydhn I think we mean permissions to trigger the workflow for dashboard? We have this for TorchInductor HUD dashboard i.e.

Ah ok, got it. Ping me on vLLM Slack with the usernames, I could help grant the permission that you need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants