Skip to content

Conversation

@JartX
Copy link
Contributor

@JartX JartX commented Aug 13, 2025

This PR fixes an issue where VLLM failed to start on ROCm GPUs that are not compatible with the rocm_aiter_fa attention backend. An example of such a GPU is the AMD Radeon RX 7900 XTX, which uses the RDNA 3 architecture.

The bug was introduced in commit 1ee5ead, which hardcoded the loading of the vllm.v1.attention.backends.rocm_aiter_fa module in vllm/v1/spec_decode/eagle.py. This forced VLLM to fail on startup before it could even select a different attention backend.

To solve this, I've added a conditional check that allows the user to explicitly enable this backend. The rocm_aiter_fa module will now only be loaded if the environment variable VLLM_ROCM_USE_AITER is set to 1.

This change ensures that:

Users with ROCm GPUs that are not compatible with the rocm_aiter_fa backend can use VLLM without any startup failures.

Users who do need this backend can still enable it manually, preserving the original functionality.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added rocm Related to AMD ROCm speculative-decoding v1 labels Aug 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a startup failure on incompatible ROCm GPUs by making the import of rocm_aiter_fa conditional based on an environment variable. The approach is sound. My review includes one high-severity suggestion to improve performance by caching the environment variable lookup, as it currently resides in a hot path.

Comment on lines 240 to 243
if os.environ.get("VLLM_ROCM_USE_AITER") == "1":
from vllm.v1.attention.backends.rocm_aiter_fa import (
AiterFlashAttentionMetadata)
allowed_types += (AiterFlashAttentionMetadata, )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling os.environ.get() inside the propose method can introduce performance overhead, as this method is on a hot path during inference. It's better to check the environment variable only once when the module is imported.

I recommend defining a module-level constant at the top of the file:

# At the top of vllm/v1/spec_decode/eagle.py
import os
_VLLM_ROCM_USE_AITER = os.environ.get("VLLM_ROCM_USE_AITER") == "1"

Then, you can use this constant here:

if _VLLM_ROCM_USE_AITER:
    from vllm.v1.attention.backends.rocm_aiter_fa import (
        AiterFlashAttentionMetadata)
    allowed_types += (AiterFlashAttentionMetadata, )

This change will improve performance by avoiding repeated environment variable lookups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@JartX
Copy link
Contributor Author

JartX commented Aug 15, 2025

Hi @russellb would you be so kind as to review this PR? Right now you can't start VLLM with ROCM and RDNA3 like 7900XTX

if os.environ.get("VLLM_ROCM_USE_AITER") == "1":
from vllm.v1.attention.backends.rocm_aiter_fa import (
AiterFlashAttentionMetadata)
allowed_types += (AiterFlashAttentionMetadata, )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the pre-commit failures under this line

(TritonAttentionMetadata, AiterFlashAttentionMetadata,
FlashAttentionMetadata))
allowed_types = (TritonAttentionMetadata, FlashAttentionMetadata)
if os.environ.get("VLLM_ROCM_USE_AITER") == "1":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way you can make this more dynamic if it's known what device types would support this vs not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russellb
I think the architecture names can be used, but it will always have to be expanded. Do you know of another mechanism for this?

For example:
def _is_rocm_gpu_with_matrix_cores() -> bool:
if not torch.cuda.is_available() or not torch.version.hip:
returns False
proof:
device_properties = torch.cuda.get_device_properties(
torch.cuda.current_device())
gcn_arch_name = getattr(device_properties, "gcnArchName", "")
supported_archs = ("gfx908", "gfx90a", "gfx940", "gfx941", "gfx942")
returns any(gcn_arch_name.startswith(arch) for arch in support_archs)
except (RuntimeError, AttributeError):
returns False

Copy link
Contributor

@tjtanaa tjtanaa Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JartX
Let's cache the value of os.environ.get as it's overhead is large, similar to
#17067

And alternative approach is to check if aiter is installed using from importlib.util import find_spec. However, this is also a very costly operation, it should be only called once when a class is initialized of a file is import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjtanaa Many thanks for your answer the other way :) 47f9141

Using fallback

@JartX JartX force-pushed the fix/disable_force_spec_eagle_rocm_aiter branch 3 times, most recently from 625860e to 9bc9f67 Compare August 16, 2025 11:35
Signed-off-by: JartX <[email protected]>
@JartX JartX force-pushed the fix/disable_force_spec_eagle_rocm_aiter branch from 9bc9f67 to d23a403 Compare August 16, 2025 11:36
@JartX
Copy link
Contributor Author

JartX commented Aug 16, 2025

Hi @russellb new way using fallback :
47f9141

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@hongxiayang
Copy link
Collaborator

cc @tjtanaa

@JartX
Copy link
Contributor Author

JartX commented Aug 17, 2025

@tjtanaa many thanks for your time,47f9141

The refactor using fallback :)

@tjtanaa
Copy link
Contributor

tjtanaa commented Aug 17, 2025

@JartX Maybe let's do this instead. we store the allowed_types in the EagleProposer class as I have wrote simple script to time the overhead. It seems it is quite high, as this cost is incurred every decode step. Usually we are decoding for a few thousand tokens like in thinking mode. So the cost will be multiplied by thousand-fold per request.

======================================================================
IMPORT TRY-EXCEPT OVERHEAD BENCHMARK
======================================================================
SUCCESS CASE (module exists):
Samples: 50,000
Mean: 8.938 μs
Median: 0.000 μs
Min: 0.000 μs
Max: 200.001 μs
90th %ile: 0.000 μs
95th %ile: 100.000 μs
99th %ile: 100.001 μs
Std Dev: 28.655 μs

FAILURE CASE (module missing):
Samples: 50,000
Mean: 62.346 μs
Median: 99.999 μs
Min: 0.000 μs
Max: 1100.000 μs
90th %ile: 100.001 μs
95th %ile: 100.001 μs
99th %ile: 200.001 μs
Std Dev: 59.740 μs

BASELINE (no try-except):
Samples: 50,000
Mean: 0.270 μs
Median: 0.000 μs
Min: 0.000 μs
Max: 1000.000 μs
90th %ile: 0.000 μs
95th %ile: 0.000 μs
99th %ile: 0.000 μs
Std Dev: 6.703 μs

Proposed solution

from importlib.util import find_spec

class EagleProposer:

    def __init__(
        self,
        vllm_config: VllmConfig,
        device: torch.device,
        runner=None,
    ):
    
    ...
    self.allowed_attn_types = ()
    if current_platform.is_rocm():
        self.allowed_attn_types += (TritonAttentionMetadata, FlashAttentionMetadata)
        
        if find_spec("aiter"):
                from vllm.v1.attention.backends.rocm_aiter_fa import (
                    AiterFlashAttentionMetadata)
                self.allowed_attn_types += (AiterFlashAttentionMetadata, )
    else:
         self.allowed_attn_types = (FlashAttentionMetadata, TreeAttentionMetadata)
    ...


    def propose(
        self,
        # [num_tokens]
        target_token_ids: torch.Tensor,
        # [num_tokens]
        target_positions: torch.Tensor,
        # [num_tokens, hidden_size]
        target_hidden_states: torch.Tensor,
        # [batch_size]
        next_token_ids: torch.Tensor,
        common_attn_metadata: CommonAttentionMetadata,
        sampling_metadata: SamplingMetadata,
        mm_embeds: Optional[list[torch.Tensor]] = None,
    ) -> torch.Tensor:

    ...

    assert isinstance(attn_metadata, self.allowed_attn_types)

    ...
    

@tjtanaa
Copy link
Contributor

tjtanaa commented Aug 19, 2025

@JartX

if current_platform.is_rocm():

...
from vllm.platforms.rocm import on_mi3xx
...
if current_platform.is_rocm() and find_spec("aiter") and on_mi3xx:
...

Then we can revert all the changes from the eagle.py

This also handles the case where the aiter is installed, and not supported.
You will not need to modify the Dockerfile.rocm in this case.

@JartX
Copy link
Contributor Author

JartX commented Aug 19, 2025

Hi @tjtanaa bad news Crash in other point after apply the last recomendation

vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559] WorkerProc failed to start.
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559] Traceback (most recent call last):
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 533, in worker_main
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     worker = WorkerProc(*args, **kwargs)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 379, in __init__
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     wrapper.init_worker(all_kwargs)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 556, in init_worker
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     worker_class = resolve_obj_by_qualname(
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]                    ^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2568, in resolve_obj_by_qualname
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     module = importlib.import_module(module_name)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     return _bootstrap._gcd_import(name[level:], package, level)
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap_external>", line 999, in exec_module
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 33, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from vllm.v1.worker.gpu_model_runner import GPUModelRunner
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 75, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from vllm.v1.spec_decode.eagle import ```
EagleProposer
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/spec_decode/eagle.py", line 23, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from vllm.v1.attention.backends.rocm_aiter_fa import (
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/rocm_aiter_fa.py", line 23, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     import aiter
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/__init__.py", line 43, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from .ops.quant import *
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/ops/quant.py", line 12, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     from ..utility import dtypes, fp4_utils
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/utility/dtypes.py", line 18, in <module>
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     fp8 = get_dtype_fp8()
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]           ^^^^^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]   File "/usr/local/lib/python3.12/dist-packages/aiter/utility/dtypes.py", line 13, in get_dtype_fp8
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]     return defaultDtypes[get_gfx()]["fp8"]
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559]            ~~~~~~~~~~~~~^^^^^^^^^^^
vllm1-1  | ERROR 08-19 15:55:09 [multiproc_executor.py:559] KeyError: 'gfx1100'

vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700] EngineCore failed to start.
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700] Traceback (most recent call last):
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 691, in run_engine_core
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     engine_core = EngineCoreProc(*args, **kwargs)
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 492, in __init__
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     super().__init__(vllm_config, executor_class, log_stats,
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 80, in __init__
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     self.model_executor = executor_class(vllm_config)
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     self._init_executor()
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 96, in _init_executor
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     self.workers = WorkerProc.wait_for_ready(unready_workers)
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 472, in wait_for_ready
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700]     raise e from None
vllm1-1  | (EngineCore_0 pid=153) ERROR 08-19 15:55:13 [core.py:700] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.

I would say that this error comes from another point. Are you sure we can't choose either of the two solutions verified above?
I have verified, at least for me, that the most solid is the one he proposed to carry out the checks in the init itself, because you do not have to touch the attention or update documentation
Thank you very much for your time

@JartX
Copy link
Contributor Author

JartX commented Aug 19, 2025

@tjtanaa think that now is the better way:

from importlib.util import find_spec

class EagleProposer:

    def __init__(
        self,
        vllm_config: VllmConfig,
        device: torch.device,
        runner=None,
    ):
    
    ...
    self.allowed_attn_types = ()
    if current_platform.is_rocm():
        self.allowed_attn_types += (TritonAttentionMetadata, FlashAttentionMetadata)
        
        if find_spec("aiter"):
                from vllm.v1.attention.backends.rocm_aiter_fa import (
                    AiterFlashAttentionMetadata)
                self.allowed_attn_types += (AiterFlashAttentionMetadata, )
    else:
         self.allowed_attn_types = (FlashAttentionMetadata, TreeAttentionMetadata)
    ...


    def propose(
        self,
        # [num_tokens]
        target_token_ids: torch.Tensor,
        # [num_tokens]
        target_positions: torch.Tensor,
        # [num_tokens, hidden_size]
        target_hidden_states: torch.Tensor,
        # [batch_size]
        next_token_ids: torch.Tensor,
        common_attn_metadata: CommonAttentionMetadata,
        sampling_metadata: SamplingMetadata,
        mm_embeds: Optional[list[torch.Tensor]] = None,
    ) -> torch.Tensor:

    ...

    assert isinstance(attn_metadata, self.allowed_attn_types)

    ...
    

Everything goes smoothly and works like a cream.


if self.use_cuda_graph and \
batch_size <= self.cudagraph_batch_sizes[-1]:
batch_size <= self.cudagraph_batch_sizes[-1]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NITs, can you revert all of the unrelated changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi! @tjtanaa
These changes were included so I could pass the precommit. I've been trying to contribute to the project for a short time, and @mgoin told me that precommit normally had to be used:

https://marketplace.visualstudio.com/items?itemName=elagil.pre-commit-helper

https://github.com/pre-commit/pre-commit

https://github.com/vllm-project/vllm/blob/main/.github/workflows/pre-commit.yml

So that it would correctly format the file after the changes.

Sorry if this bothered you. Thank you very much for your time and dedication.

If you find that I have it configured incorrectly, please don't hesitate to let me know.

P.S.: If I remove the spaces and use precommit check again, I get an error, so I have to use the fix. It then adds the spaces back and leaves everything ok.

@tjtanaa
Copy link
Contributor

tjtanaa commented Aug 20, 2025

@JartX Can you revert all of the unrelated changes? Those changes in indentation and spaces?
Else LGTM.

assert isinstance(attn_metadata, FlashAttentionMetadata)
# The mypy errors are caused because mypy cannot infer the type of
# attn_metadata. We add this assert to help mypy.
assert isinstance(attn_metadata, FlashAttentionMetadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JartX I tested using other backend. This will cause issue as

FlashAttentionMetadata is not a generic class.

TreeAttentionMetadata, AiterFlashAttentionMetadata, TritonAttentionMetadata and FlashAttentionMetadata are 4 different instances.

I have opened a PR into your branch JartX#1 . It is a mypy fix through Protocol class.

Copy link
Contributor Author

@JartX JartX Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged. @tjtanaa Thank you very much for helping me with the development and testing. I have very limited hardware and am assimilating the work on VLLM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this PR as well @JartX 🥂

…aiter

[Bugfix] Fix mypy error with Protocol
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass, thanks!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 20, 2025 12:32
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 20, 2025
@vllm-bot vllm-bot merged commit 3b11b26 into vllm-project:main Aug 20, 2025
45 of 47 checks passed
@JartX JartX deleted the fix/disable_force_spec_eagle_rocm_aiter branch August 20, 2025 16:15
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
shanes-cerebras pushed a commit to smsegal/vllm that referenced this pull request Aug 24, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Signed-off-by: Xiao Yu <[email protected]>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…patible with AITER (vllm-project#22795)

Signed-off-by: JartX <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants