Preserve recursive CTE nullability across logical and physical planning#22037
Preserve recursive CTE nullability across logical and physical planning#22037kosiew wants to merge 25 commits into
Conversation
- Added `align_plan_to_schema` and `SchemaAlignExec` for improved schema alignment in execution plans. - Maintained strict behavior in `project_plan_to_schema` for projection-only cases. - Updated adapter to handle nullability narrowing while preserving SQL behavior. - Modified `RecursiveQueryExec` to preserve static/declared schema and aligned recursive term at plan construction. - Removed nullability-widening schema synthesis for cleaner execution. - Restored `0 AS` level in SQL logic test file `cte.slt`.
…ent behavior - Added direct tests for align_plan_to_schema: - Verified exact schema returns the same plan. - Ensured rename-only uses ProjectionExec. - Confirmed nullability narrowing uses SchemaAlignExec. - Tested count/type/field metadata/schema metadata errors. - Documented conservative property behavior in the adapter path.
- Refactored `align_plan_to_schema` function to store input schema in a variable, reducing redundant calls. - Updated validation and comparison logic for better clarity and performance. - Simplified partitioning handling in `SchemaAlignExec` by consolidating pattern matching. - Enhanced `DisplayAs` implementation to correctly handle `TreeRender` format.
…odules - Reuse `input_schema` in common.rs - Simplify projected return using `debug_assert_eq!` - Utilize `partition_count()` in common.rs - Modify TreeRender to return `Ok(())` - Reuse `static_schema` in tests for recursive_query.rs
- Removed redundant upfront align validation in common.rs. - Added test helpers in common.rs: - single_field_schema - single_i32_exec - metadata mismatch builders - Shortened repeated test setup in common.rs. - Added recursive_exec test helper in recursive_query.rs. - Simplified RecursiveQueryExec::try_new(...) in recursive_query.rs.
|
Please let me know if I'm understanding this correctly:
If that is accurate, then the proposed behavior would result in this query producing an error: SET datafusion.execution.enable_recursive_ctes = true;
WITH RECURSIVE t AS (
SELECT 0 AS n
UNION ALL
SELECT CAST(NULL AS INT) AS n FROM t WHERE n IS NOT NULL
)
SELECT * FROM t;( Instead, shouldn't we be computing the CTE's logical schema by widening the anchor and the recursive schemas? This is conceptually similar to what we do for |
…and tests - Added `schema: DFSchemaRef` to `RecursiveQuery`. - Updated `LogicalPlan::RecursiveQuery.schema()` to return the stored schema. - Introduced `RecursiveQuery::try_new(...)` for schema derivation based on static anchor field names, qualifiers, data types, nullability, and intersected metadata. - Implemented manual `PartialOrd` for `RecursiveQuery`. - Modified `to_recursive_query` to utilize `RecursiveQuery::try_new(...)`. - Added unit test for widening nullability in recursive query schema. - Ensured `RecursiveQuery` rebuilds correctly after child transforms using `try_new(...)`. - Updated deserialization of `RecursiveQuery` to leverage `try_new(...)`. - Enhanced `RecursiveQueryExec::try_new` to derive widened output schema using static and recursive schemas. - Introduced a helper function for generating recursive query output schema. - Updated tests for executive schema handling of recursive nullable outputs. - Added a SQL regression test to verify recursive term behavior and expected output.
- Addressed issue with the work table being planned with anchor/static schema only. - Modified logic to ensure that recursive term is planned once with anchor schema, preventing non-null optimizations that lead to infinite NULL emissions. - Built initial recursive CTE schema and recreated work table if schema nullability widened. - Replanned the recursive term using the widened work table schema to avoid inefficiencies.
- Added private `cte_work_table_plan` in `cte.rs` - Removed duplicated work-table source/scan construction in `cte.rs` - Simplified `recursive_query_schema` in `plan.rs` - Removed unnecessary Result wrapping in field collection in `plan.rs` - Used `Field::with_metadata` in `plan.rs` - Updated stale comment and used `Field::with_metadata` in `recursive_query.rs`
- Updated `RecursiveQuery::try_new` to validate column count and data types. - Added direct regression tests for logical plan. - Enhanced physical recursive schema to intersect field/schema metadata like logical schema. - Implemented metadata regression test in physical plan. - Improved `align_plan_to_schema` to align metadata via `SchemaAlignExec`. - Maintained behavior in `project_plan_to_schema` to reject metadata changes. - Added comment for projection-error fallback in common code. - Clarified comments regarding two-pass recursive planning in SQL component.
- Updated RecursiveQueryExec to accept declared logical recursive CTE schema. - Removed physical recursive schema recomputation, using logical schema as source of truth. - Aligned children to declared schema. - Introduced private recursive-CTE-local schema rebind exec for metadata/name/schema-only fixes. - Eliminated broad global align_plan_to_schema and SchemaAlignExec, retaining narrower project_plan_to_schema.
- Renamed helper function from `align_recursive_plan_to_schema` to `align_recursive_child_to_logical_schema`. - Updated fallback mechanism to preserve `project_plan_to_schema` errors when local rebind cannot handle cases safely. - `RecursiveSchemaRebindExec` now rejects: - Schema metadata mismatches - Field metadata mismatches - Column count mismatches - Type mismatches - Maintained support for nullability-only schema rebind. - Updated tests to include: - Nullability rebind test - Field metadata rejection test - Schema metadata rejection test
|
@neilconway and this suggestion |
- Updated function signature in recursive_query to take a reference - Updated internal call site in with_new_children to accommodate the change - Modified test helper and all affected test call sites in recursive_query - Updated planner call site in physical_planner to align with new function signature
…ionale - Added two comments in `plan.rs` to clarify the name-preservation invariant and nullability-widening rationale at the construction site. - Updated documentation in `recursive_query.rs` to note that `output_schema` is pre-widened, ensuring safe direction for recursive CTEs. - Introduced a new query in `cte.slt` to test distinct column aliases, reinforcing the invariant that the CTE's exposed column name comes from the anchor term.
neilconway
left a comment
There was a problem hiding this comment.
This query hangs now:
WITH RECURSIVE t(a, b) AS (
SELECT 0 AS a, 0 AS b
UNION ALL
SELECT b AS a, CAST(NULL AS INT) AS b FROM t WHERE a IS NOT NULL
)
SELECT * FROM t;because we incorrectly conclude that t.a is non-nullable, and so the optimizer elides the filter.
It seems a 2-pass approach isn't sufficient -- one approach would be to keep replanning until we reach a fixed point (which we should).
Alternatively, we could do something simpler: nullability analysis in DF is already quite conservative, and computing precise nullability for CTEs is not the low-hanging fruit if we want to make it more precise. Is it really that big of a loss if we mark CTE columns as nullable?
… apply to RecursiveQueryExec (apache#21912)
- Added SLT repro in datafusion/sqllogictest/test_files/cte.slt - Fixed recursive CTE work-table nullability: - Work table schema is now conservatively nullable - RecursiveQuery now stores output schema - Schema nullability considers static OR recursive term - Proto deserialize now rebuilds via builder - Updated affected EXPLAIN expectations
- Removed public RecursiveQuery.schema - Restored original public struct shape - Kept nullability handling internal: - Recursive builder coerces terms to conservative nullable schema via existing projection schema override - Optimizer child rewrites rebuild recursive query via builder - Aggregate planner reconciles nullability only for recursive-query inputs - Updated affected SLT explain output
…ly nullability widening - Only reconciles nullability widening; rejects mismatches in count, name, type, and field/schema metadata. - Removes zip truncation masking. - Renamed function contains_recursive_query_input for clarity. - Added comment to clarify aggregate recursive CTE special-case. - Updated plan_with_schema to use input schema expressions instead of target schema columns. - Introduced focused unit tests for validating allowed/rejected reconciliation cases. - Adjusted SLT explain to align with the new safer projection logic.
- Added `datafusion/common/src/recursive_schema.rs` with the following functions: - `make_schema_nullable` - `recursive_query_output_schema` - `reconcile_dfschema_with_schema_nullability` - Tests for nullability widening and reject mismatches. - Integrated the new schema helpers into existing components: - Updated `sql/src/cte.rs` to use the common nullable work-table schema helper. - Updated `expr/src/logical_plan/builder.rs` to use the common recursive output schema helper. - Updated `core/src/physical_planner.rs` to use the common physical/logical reconciliation helper. - Removed duplicated local helpers and tests from `core`, `expr`, and `sql`. Semver: - No breaking field/signature changes; added a doc-hidden helper module only.
- Changed projection expressions in the `parquet_recursive_projection_pushdown` test to use `CAST` for consistency and improved type safety.
- Refactored TreeNode::exists in physical planner and CTE modules - Removed redundant recursive CTE re-coercion in logical plan builder - Inlined small one-use variables in recursive schema module
|
I implemented |
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing nullability-mismatch-22034 (aa7fb40) to 937dfda (merge-base) diff File an issue against this benchmark runner |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#22037 (comment)).
File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
Recursive CTEs can widen column nullability between the anchor term and recursive term. Prior to this change, the recursive work table and recursive query output schema inherited the anchor term schema directly, which could incorrectly preserve non-nullability.
This caused downstream logical optimizations and physical schema checks to operate on overly strict schemas. In particular, nullability-based simplifications could incorrectly remove semantically required predicates in recursive queries.
This PR makes recursive CTE schema handling conservative for nullability while preserving all other schema dimensions exactly.
What changes are included in this PR?
Added internal recursive schema reconciliation helpers in
datafusion_common::recursive_schemafor:Updated recursive CTE planning to:
Added
plan_with_schemahelper to rebuild plans with reconciled schemasUpdated
RecursiveQueryreconstruction paths in:Relaxed aggregate physical schema validation for recursive queries when the only mismatch is widened nullability
Simplified recursive work table reference detection using
LogicalPlan::existsUpdated explain output expectations to reflect explicit casts introduced by schema coercion
Added regression coverage for recursive CTE nullability handling
Are these changes tested?
Yes.
Added unit tests in
datafusion/common/src/recursive_schema.rscovering:Added sqllogictest coverage in
datafusion/sqllogictest/test_files/cte.sltfor recursive CTE nullability behavior, including a regression test ensuringIS NOT NULLpredicates are preserved correctly.Updated existing explain and physical plan expectations in:
datafusion/core/tests/sql/explain_analyze.rsdatafusion/sqllogictest/test_files/cte.sltdatafusion/sqllogictest/test_files/explain_tree.sltAre there any user-facing changes?
Yes.
Recursive CTEs now conservatively widen output nullability when recursive terms can produce nullable values. This fixes incorrect optimizer behavior and prevents physical planning failures caused by nullability mismatches in recursive queries.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.