feat: add --dry-run VRAM/size estimation mode by mvanhorn · Pull Request #1958 · intel/auto-round

mvanhorn · 2026-06-26T19:37:17Z

Re-submit of #1592, reworked per your feedback. The VRAM estimate is now built on auto-round's own block-wise memory model instead of a raw parameter count:

Reuses auto_round.utils.device.estimate_tuning_block_mem and get_moe_memory_ratio (@xin3he's card_0 = block_input_output + layer_activation + additional formula).
Models peak memory at decoder-block granularity (@wenhuach21).
Excludes the block input/output cache when low_gpu_mem_usage is set, and includes it otherwise (@wenhuach21's caching point).
MoE handled via get_moe_memory_ratio; layer/hidden-size discovery is robust across config field names and nested configs (text_config etc.) since num_hidden_layers doesn't cover every model (@xin3he).
Loads AutoConfig only, no weights.

Unit tests cover helper reuse, the low_gpu_mem_usage path, MoE, and layer-count fallbacks (8 passing). Supersedes #1592 (couldn't reopen after the rebase).

Adds a --dry-run flag that estimates peak block-tuning VRAM, output size, and approximate time from the model config alone (AutoConfig only, no weights). The VRAM estimate is built on auto-round's own block-wise memory model, reusing auto_round.utils.device.estimate_tuning_block_mem and get_moe_memory_ratio per maintainer feedback on intel#1592: - peak memory at decoder-block granularity (card_0 = block I/O cache + layer activations + additional overhead) - block input/output cache excluded when low_gpu_mem_usage is set - MoE handled via get_moe_memory_ratio - robust layer/hidden-size discovery across config field names and nested (text_config etc.) configs Adds unit tests for helper reuse, the low_gpu_mem_usage path, MoE, and layer-count fallbacks. Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

for more information, see https://pre-commit.ci Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Matches the codebase convention (calib_dataset.py, utils/model.py) where every optional modelscope import carries a pylint E0401 disable, since modelscope is not in requirements. Fixes the Code-Scan-AutoRound failure. Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

mvanhorn · 2026-06-27T17:29:41Z

Pushed a fix for the two red checks:

Code-Scan (pylint E0401): the optional modelscope import in _load_auto_config now carries # pylint: disable=E0401, matching how calib_dataset.py and utils/model.py handle modelscope (it's not in requirements). Scan is now 10.00/10 locally.
DCO: signed off all commits on the branch.

mvanhorn and others added 3 commits June 27, 2026 10:27

[pre-commit.ci] auto fixes from pre-commit.com hooks

96bf721

for more information, see https://pre-commit.ci Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

mvanhorn force-pushed the osc/feat-dry-run branch from 4649988 to 700c697 Compare June 27, 2026 17:28

This was linked to issues Jul 2, 2026

[Bug]: dry-run estimation should fall back to text_config for Qwen3.5-MoE #1689

Open

[Feature]: add --dry-run estimation mode #1591

Open

chensuyue requested review from lvliang-intel and n1ck-guo July 2, 2026 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add --dry-run VRAM/size estimation mode#1958

feat: add --dry-run VRAM/size estimation mode#1958
mvanhorn wants to merge 3 commits into
intel:mainfrom
mvanhorn:osc/feat-dry-run

mvanhorn commented Jun 26, 2026

Uh oh!

mvanhorn commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mvanhorn commented Jun 26, 2026

Uh oh!

mvanhorn commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant