Skip to content

[python][daft] Add Daft-side scan explain diagnostics#8017

Open
QuakeWang wants to merge 1 commit into
apache:masterfrom
QuakeWang:daft-paimon-explain
Open

[python][daft] Add Daft-side scan explain diagnostics#8017
QuakeWang wants to merge 1 commit into
apache:masterfrom
QuakeWang:daft-paimon-explain

Conversation

@QuakeWang
Copy link
Copy Markdown
Contributor

Purpose

close: #7998

Daft's Paimon reader already chooses between native Parquet reads and pypaimon fallback internally, but that routing decision was not observable from the public Paimon Daft API. ReadBuilder.explain() only describes the Paimon scan plan, so users could not diagnose whether a slow scan was caused by PK merge, deletion vectors, BLOB columns, non-Parquet format, or pushdown behavior.

This PR adds a structured Daft-side scan explain API:

  • explain_paimon_scan(...)
  • PaimonTable.explain_scan(...)

The result includes the underlying Paimon scan explain plus Daft reader routing details: native/fallback split and file counts, fallback reasons, pushed/remaining filters, projection/limit pushdown status, and optional per-split reader mode.

The implementation reuses the same scan builder, partition filtering, and native/fallback routing helpers used by PaimonDataSource.get_tasks() to avoid divergence between diagnostics and actual execution.

Tests

  • pytest paimon-python/pypaimon/tests/daft/daft_explain_test.py -q
  • pytest paimon-python/pypaimon/tests/daft/daft_data_test.py paimon-python/pypaimon/tests/daft/daft_sink_test.py -q

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

+1

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I reviewed the Daft scan explain plumbing and the reader-routing diagnostics. The explain path mirrors the real datasource planning path: it builds the same pushdown state, uses the same Paimon scan builder, applies the same partition-filter skipping, and classifies native Parquet vs pypaimon fallback splits with the same routing conditions. The public APIs on read_paimon, PaimonTable, and explain_paimon_scan look consistent.

Validated locally:

  • python3 -m py_compile pypaimon/daft/__init__.py pypaimon/daft/daft_catalog.py pypaimon/daft/daft_datasource.py pypaimon/daft/daft_explain.py pypaimon/daft/daft_paimon.py pypaimon/tests/daft/daft_explain_test.py
  • python3 -m flake8 --config=./dev/cfg.ini on the changed Daft files

I could not execute the Daft tests locally because this environment does not have daft installed, but the PR's Python checks are green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Daft-side scan explain for native Parquet and pypaimon fallback diagnostics

3 participants