[python][daft] Add Daft-side scan explain diagnostics#8017
Open
QuakeWang wants to merge 1 commit into
Open
Conversation
Contributor
|
+1 |
leaves12138
approved these changes
May 29, 2026
Contributor
leaves12138
left a comment
There was a problem hiding this comment.
LGTM.
I reviewed the Daft scan explain plumbing and the reader-routing diagnostics. The explain path mirrors the real datasource planning path: it builds the same pushdown state, uses the same Paimon scan builder, applies the same partition-filter skipping, and classifies native Parquet vs pypaimon fallback splits with the same routing conditions. The public APIs on read_paimon, PaimonTable, and explain_paimon_scan look consistent.
Validated locally:
python3 -m py_compile pypaimon/daft/__init__.py pypaimon/daft/daft_catalog.py pypaimon/daft/daft_datasource.py pypaimon/daft/daft_explain.py pypaimon/daft/daft_paimon.py pypaimon/tests/daft/daft_explain_test.pypython3 -m flake8 --config=./dev/cfg.inion the changed Daft files
I could not execute the Daft tests locally because this environment does not have daft installed, but the PR's Python checks are green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
close: #7998
Daft's Paimon reader already chooses between native Parquet reads and pypaimon fallback internally, but that routing decision was not observable from the public Paimon Daft API.
ReadBuilder.explain()only describes the Paimon scan plan, so users could not diagnose whether a slow scan was caused by PK merge, deletion vectors, BLOB columns, non-Parquet format, or pushdown behavior.This PR adds a structured Daft-side scan explain API:
explain_paimon_scan(...)PaimonTable.explain_scan(...)The result includes the underlying Paimon scan explain plus Daft reader routing details: native/fallback split and file counts, fallback reasons, pushed/remaining filters, projection/limit pushdown status, and optional per-split reader mode.
The implementation reuses the same scan builder, partition filtering, and native/fallback routing helpers used by
PaimonDataSource.get_tasks()to avoid divergence between diagnostics and actual execution.Tests
pytest paimon-python/pypaimon/tests/daft/daft_explain_test.py -qpytest paimon-python/pypaimon/tests/daft/daft_data_test.py paimon-python/pypaimon/tests/daft/daft_sink_test.py -q