fix(gpt-oss): emit fused raw expert tensors for SGLang by Jiang020609 · Pull Request #2004 · THUDM/slime

Jiang020609 · 2026-06-01T17:29:04Z

Summary

This fixes GPT-OSS raw Megatron-to-HF conversion for the non-colocate SGLang weight update path.

Previously, GPT-OSS expert tensors were emitted as per-expert gate_proj / up_proj / down_proj names. SGLang expects fused expert tensors for GPT-OSS, so the raw converter produced tensors that could not be loaded correctly by SGLang's fused MoE weight loader.

Changes

Convert linear_fc1.weight into interleaved fused gate_up_proj.
Transpose linear_fc2.weight before emitting down_proj.
Fuse per-expert tensors into 3D tensors shaped [num_experts, ...] before returning them to the weight update path.
Apply the same fused naming/layout for gate_up_proj_bias and down_proj_bias.
Add CPU unit tests for GPT-OSS raw expert weight and bias conversion.

Validation

python -m ruff check slime/backends/megatron_utils/megatron_to_hf/gpt_oss.py tests/test_gpt_oss_raw_converter.py
python -m pytest tests/test_gpt_oss_raw_converter.py

GPU smoke on 1x NVIDIA A800-SXM4-80GB with lmsys/gpt-oss-20b-bf16:

SGLang successfully loaded the GPT-OSS checkpoint.
Prepared fused update tensors:
- model.layers.0.mlp.experts.gate_up_proj: (32, 2880, 5760)
- model.layers.0.mlp.experts.down_proj: (32, 2880, 2880)
- model.layers.0.mlp.experts.gate_up_proj_bias: (32, 5760)
- model.layers.0.mlp.experts.down_proj_bias: (32, 2880)
engine.update_weights_from_tensor(...) returned (True, 'Success').
Deterministic generation before and after the no-op raw expert update matched.
Smoke result: GPT-OSS raw SGLang smoke passed.

fix(gpt-oss): fuse raw expert tensors for sglang

aa7c88e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gpt-oss): emit fused raw expert tensors for SGLang#2004

fix(gpt-oss): emit fused raw expert tensors for SGLang#2004
Jiang020609 wants to merge 1 commit into
THUDM:mainfrom
Jiang020609:fix/gpt-oss-raw-expert-conversion

Jiang020609 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jiang020609 commented Jun 1, 2026

Summary

Changes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant