Your current environment
System Info
OS: Linux (e.g., Fedora 43)
Hardware: AMD Strix Halo APU (gfx1151 / RDNA 3.5)
vLLM version: v0.19.2 (and recent nightlies/main)
Model: openai/gpt-oss-20b (or any gpt_oss_mxfp4 quantized MoE model)
🐛 Describe the bug
I am trying to run vLLM on an AMD Strix Halo (gfx1151) using ROCm. The environment is properly configured to compile Triton kernels. Previously, gpt-oss-20b (which initializes using gpt_oss_mxfp4 quantization) worked perfectly fine and used the Triton MXFP4 MoE backend as expected.
However, a recent update explicitly bounded the device_capability checks for the Triton MoE kernels to < (11, 0).
- In
vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py:
def _supports_current_device() -> bool:
...
return (9, 0) <= (cap.major, cap.minor) < (11, 0)
- In
vllm/model_executor/layers/fused_moe/oracle/mxfp4.py:
triton_kernels_supported = has_triton_kernels() and (
9,
0,
) <= current_platform.get_device_capability() < (11, 0)
Because vLLM maps gfx1151 to a device capability of (11, 5), the < (11, 0) check completely fails for the entire RDNA3/RDNA3.5 family. As a result, the backend oracle drops the Triton kernels, cannot find any other fallback MXFP4 backends for ROCm, and crashes with:
NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.
Could this check please be widened to (9, 0) <= cap < (12, 0) to allow RDNA3 architectures? Or was there a specific hardware-level bug on Blackwell/future architectures that necessitated this hard < (11,0) roof?
Before submitting a new issue...
Your current environment
System Info
OS: Linux (e.g., Fedora 43)
Hardware: AMD Strix Halo APU (gfx1151 / RDNA 3.5)
vLLM version:
v0.19.2(and recent nightlies/main)Model:
openai/gpt-oss-20b(or anygpt_oss_mxfp4quantized MoE model)🐛 Describe the bug
I am trying to run vLLM on an AMD Strix Halo (gfx1151) using ROCm. The environment is properly configured to compile Triton kernels. Previously,
gpt-oss-20b(which initializes usinggpt_oss_mxfp4quantization) worked perfectly fine and used the Triton MXFP4 MoE backend as expected.However, a recent update explicitly bounded the
device_capabilitychecks for the Triton MoE kernels to< (11, 0).vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py:vllm/model_executor/layers/fused_moe/oracle/mxfp4.py:Because vLLM maps
gfx1151to a device capability of(11, 5), the< (11, 0)check completely fails for the entire RDNA3/RDNA3.5 family. As a result, the backend oracle drops the Triton kernels, cannot find any other fallback MXFP4 backends for ROCm, and crashes with:Could this check please be widened to
(9, 0) <= cap < (12, 0)to allow RDNA3 architectures? Or was there a specific hardware-level bug on Blackwell/future architectures that necessitated this hard< (11,0)roof?Before submitting a new issue...