Preserve recursive CTE nullability across logical and physical planning by kosiew · Pull Request #22037 · apache/datafusion

kosiew · 2026-05-06T05:23:08Z

Which issue does this PR close?

Closes Recursive CTE Nullability Handling Should Preserve Logical Schema Without Requiring SQL Rewrites #22034.

Rationale for this change

Recursive CTEs can widen column nullability between the anchor term and recursive term. Prior to this change, the recursive work table and recursive query output schema inherited the anchor term schema directly, which could incorrectly preserve non-nullability.

This caused downstream logical optimizations and physical schema checks to operate on overly strict schemas. In particular, nullability-based simplifications could incorrectly remove semantically required predicates in recursive queries.

This PR makes recursive CTE schema handling conservative for nullability while preserving all other schema dimensions exactly.

What changes are included in this PR?

Added internal recursive schema reconciliation helpers in datafusion_common::recursive_schema for:
- widening schema nullability while preserving metadata and field properties
- deriving recursive query output schemas
- reconciling logical and physical schemas when only nullability differs
Updated recursive CTE planning to:
- create nullable work table schemas
- derive recursive query output schemas from both anchor and recursive terms
- preserve anchor-term schema metadata, field names, and functional dependencies
Added plan_with_schema helper to rebuild plans with reconciled schemas
Updated RecursiveQuery reconstruction paths in:
- logical plan rebuilding
- protobuf deserialization
Relaxed aggregate physical schema validation for recursive queries when the only mismatch is widened nullability
Simplified recursive work table reference detection using LogicalPlan::exists
Updated explain output expectations to reflect explicit casts introduced by schema coercion
Added regression coverage for recursive CTE nullability handling

Are these changes tested?

Yes.

Added unit tests in datafusion/common/src/recursive_schema.rs covering:

metadata preservation when widening nullability
recursive output schema reconciliation
nullability-only schema reconciliation
rejection of unsupported schema mismatches

Added sqllogictest coverage in datafusion/sqllogictest/test_files/cte.slt for recursive CTE nullability behavior, including a regression test ensuring IS NOT NULL predicates are preserved correctly.

Updated existing explain and physical plan expectations in:

datafusion/core/tests/sql/explain_analyze.rs
datafusion/sqllogictest/test_files/cte.slt
datafusion/sqllogictest/test_files/explain_tree.slt

Are there any user-facing changes?

Yes.

Recursive CTEs now conservatively widen output nullability when recursive terms can produce nullable values. This fixes incorrect optimizer behavior and prevents physical planning failures caused by nullability mismatches in recursive queries.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

- Added `align_plan_to_schema` and `SchemaAlignExec` for improved schema alignment in execution plans. - Maintained strict behavior in `project_plan_to_schema` for projection-only cases. - Updated adapter to handle nullability narrowing while preserving SQL behavior. - Modified `RecursiveQueryExec` to preserve static/declared schema and aligned recursive term at plan construction. - Removed nullability-widening schema synthesis for cleaner execution. - Restored `0 AS` level in SQL logic test file `cte.slt`.

…ent behavior - Added direct tests for align_plan_to_schema: - Verified exact schema returns the same plan. - Ensured rename-only uses ProjectionExec. - Confirmed nullability narrowing uses SchemaAlignExec. - Tested count/type/field metadata/schema metadata errors. - Documented conservative property behavior in the adapter path.

- Refactored `align_plan_to_schema` function to store input schema in a variable, reducing redundant calls. - Updated validation and comparison logic for better clarity and performance. - Simplified partitioning handling in `SchemaAlignExec` by consolidating pattern matching. - Enhanced `DisplayAs` implementation to correctly handle `TreeRender` format.

…odules - Reuse `input_schema` in common.rs - Simplify projected return using `debug_assert_eq!` - Utilize `partition_count()` in common.rs - Modify TreeRender to return `Ok(())` - Reuse `static_schema` in tests for recursive_query.rs

- Removed redundant upfront align validation in common.rs. - Added test helpers in common.rs: - single_field_schema - single_i32_exec - metadata mismatch builders - Shortened repeated test setup in common.rs. - Added recursive_exec test helper in recursive_query.rs. - Simplified RecursiveQueryExec::try_new(...) in recursive_query.rs.

neilconway · 2026-05-09T14:53:00Z

Please let me know if I'm understanding this correctly:

The PR aims to address a situation where there is a schema mismatch between the anchor and recursive cases in a CTE
In particular, we might infer different nullability properties between the anchor vs the recursive query -- e.g., if we have 0 in the anchor and min(...) in the recursive case, 0 is non-nullable and min(...) is nullable (as an aside, the latter is conservative: min(x) without FILTER in a grouped query is non-nullable if x is non-nullable, but I suppose this is a separate planner shortcoming...)
The proposed behavior is to apply the anchor schema to the recursive CTE schema. So we would effectively be requiring that a nullable min expression never returns a NULL, in the example above
If the recursive query does return a NULL, we produce a runtime error

If that is accurate, then the proposed behavior would result in this query producing an error:

SET datafusion.execution.enable_recursive_ctes = true;

  WITH RECURSIVE t AS (
    SELECT 0 AS n
    UNION ALL
    SELECT CAST(NULL AS INT) AS n FROM t WHERE n IS NOT NULL
  )
  SELECT * FROM t;

(Column 'n' is declared as non-nullable but contains null values) -- but this query seems entirely reasonable to me and is allowed by other SQL implementations (e.g., Postgres, DuckDB, MariaDB, SQLite).

Instead, shouldn't we be computing the CTE's logical schema by widening the anchor and the recursive schemas? This is conceptually similar to what we do for UNION. That is, if the anchor expression is non-nullable and the recursive expression is nullable, the output schema should be nullable.

…and tests - Added `schema: DFSchemaRef` to `RecursiveQuery`. - Updated `LogicalPlan::RecursiveQuery.schema()` to return the stored schema. - Introduced `RecursiveQuery::try_new(...)` for schema derivation based on static anchor field names, qualifiers, data types, nullability, and intersected metadata. - Implemented manual `PartialOrd` for `RecursiveQuery`. - Modified `to_recursive_query` to utilize `RecursiveQuery::try_new(...)`. - Added unit test for widening nullability in recursive query schema. - Ensured `RecursiveQuery` rebuilds correctly after child transforms using `try_new(...)`. - Updated deserialization of `RecursiveQuery` to leverage `try_new(...)`. - Enhanced `RecursiveQueryExec::try_new` to derive widened output schema using static and recursive schemas. - Introduced a helper function for generating recursive query output schema. - Updated tests for executive schema handling of recursive nullable outputs. - Added a SQL regression test to verify recursive term behavior and expected output.

- Addressed issue with the work table being planned with anchor/static schema only. - Modified logic to ensure that recursive term is planned once with anchor schema, preventing non-null optimizations that lead to infinite NULL emissions. - Built initial recursive CTE schema and recreated work table if schema nullability widened. - Replanned the recursive term using the widened work table schema to avoid inefficiencies.

- Added private `cte_work_table_plan` in `cte.rs` - Removed duplicated work-table source/scan construction in `cte.rs` - Simplified `recursive_query_schema` in `plan.rs` - Removed unnecessary Result wrapping in field collection in `plan.rs` - Used `Field::with_metadata` in `plan.rs` - Updated stale comment and used `Field::with_metadata` in `recursive_query.rs`

- Updated `RecursiveQuery::try_new` to validate column count and data types. - Added direct regression tests for logical plan. - Enhanced physical recursive schema to intersect field/schema metadata like logical schema. - Implemented metadata regression test in physical plan. - Improved `align_plan_to_schema` to align metadata via `SchemaAlignExec`. - Maintained behavior in `project_plan_to_schema` to reject metadata changes. - Added comment for projection-error fallback in common code. - Clarified comments regarding two-pass recursive planning in SQL component.

- Updated RecursiveQueryExec to accept declared logical recursive CTE schema. - Removed physical recursive schema recomputation, using logical schema as source of truth. - Aligned children to declared schema. - Introduced private recursive-CTE-local schema rebind exec for metadata/name/schema-only fixes. - Eliminated broad global align_plan_to_schema and SchemaAlignExec, retaining narrower project_plan_to_schema.

- Renamed helper function from `align_recursive_plan_to_schema` to `align_recursive_child_to_logical_schema`. - Updated fallback mechanism to preserve `project_plan_to_schema` errors when local rebind cannot handle cases safely. - `RecursiveSchemaRebindExec` now rejects: - Schema metadata mismatches - Field metadata mismatches - Column count mismatches - Type mismatches - Maintained support for nullability-only schema rebind. - Updated tests to include: - Nullability rebind test - Field metadata rejection test - Schema metadata rejection test

kosiew · 2026-05-11T01:42:01Z

@neilconway
Thanks for this test case:

SET datafusion.execution.enable_recursive_ctes = true;

  WITH RECURSIVE t AS (
    SELECT 0 AS n
    UNION ALL
    SELECT CAST(NULL AS INT) AS n FROM t WHERE n IS NOT NULL
  )
  SELECT * FROM t;

and this suggestion

shouldn't we be computing the CTE's logical schema by widening the anchor and the recursive schemas? This is conceptually similar to what we do for UNION. That is, if the anchor expression is non-nullable and the recursive expression is nullable, the output schema should be nullable.

- Updated function signature in recursive_query to take a reference - Updated internal call site in with_new_children to accommodate the change - Modified test helper and all affected test call sites in recursive_query - Updated planner call site in physical_planner to align with new function signature

…ving metadata

…ionale - Added two comments in `plan.rs` to clarify the name-preservation invariant and nullability-widening rationale at the construction site. - Updated documentation in `recursive_query.rs` to note that `output_schema` is pre-widened, ensuring safe direction for recursive CTEs. - Introduced a new query in `cte.slt` to test distinct column aliases, reinforcing the invariant that the CTE's exposed column name comes from the anchor term.

neilconway

This query hangs now:

  WITH RECURSIVE t(a, b) AS (
    SELECT 0 AS a, 0 AS b
    UNION ALL
    SELECT b AS a, CAST(NULL AS INT) AS b FROM t WHERE a IS NOT NULL
  )
  SELECT * FROM t;

because we incorrectly conclude that t.a is non-nullable, and so the optimizer elides the filter.

It seems a 2-pass approach isn't sufficient -- one approach would be to keep replanning until we reach a fixed point (which we should).

Alternatively, we could do something simpler: nullability analysis in DF is already quite conservative, and computing precise nullability for CTEs is not the low-hanging fruit if we want to make it more precise. Is it really that big of a loss if we mark CTE columns as nullable?

… apply to RecursiveQueryExec (apache#21912)

- Added SLT repro in datafusion/sqllogictest/test_files/cte.slt - Fixed recursive CTE work-table nullability: - Work table schema is now conservatively nullable - RecursiveQuery now stores output schema - Schema nullability considers static OR recursive term - Proto deserialize now rebuilds via builder - Updated affected EXPLAIN expectations

- Removed public RecursiveQuery.schema - Restored original public struct shape - Kept nullability handling internal: - Recursive builder coerces terms to conservative nullable schema via existing projection schema override - Optimizer child rewrites rebuild recursive query via builder - Aggregate planner reconciles nullability only for recursive-query inputs - Updated affected SLT explain output

…ction

…ly nullability widening - Only reconciles nullability widening; rejects mismatches in count, name, type, and field/schema metadata. - Removes zip truncation masking. - Renamed function contains_recursive_query_input for clarity. - Added comment to clarify aggregate recursive CTE special-case. - Updated plan_with_schema to use input schema expressions instead of target schema columns. - Introduced focused unit tests for validating allowed/rejected reconciliation cases. - Adjusted SLT explain to align with the new safer projection logic.

- Added `datafusion/common/src/recursive_schema.rs` with the following functions: - `make_schema_nullable` - `recursive_query_output_schema` - `reconcile_dfschema_with_schema_nullability` - Tests for nullability widening and reject mismatches. - Integrated the new schema helpers into existing components: - Updated `sql/src/cte.rs` to use the common nullable work-table schema helper. - Updated `expr/src/logical_plan/builder.rs` to use the common recursive output schema helper. - Updated `core/src/physical_planner.rs` to use the common physical/logical reconciliation helper. - Removed duplicated local helpers and tests from `core`, `expr`, and `sql`. Semver: - No breaking field/signature changes; added a doc-hidden helper module only.

- Changed projection expressions in the `parquet_recursive_projection_pushdown` test to use `CAST` for consistency and improved type safety.

- Refactored TreeNode::exists in physical planner and CTE modules - Removed redundant recursive CTE re-coercion in logical plan builder - Inlined small one-use variables in recursive schema module

kosiew · 2026-05-13T13:50:05Z

@neilconway

I implemented mark CTE columns as nullable and added slt for

  WITH RECURSIVE t(a, b) AS (
    SELECT 0 AS a, 0 AS b
    UNION ALL
    SELECT b AS a, CAST(NULL AS INT) AS b FROM t WHERE a IS NOT NULL
  )
  SELECT * FROM t;

kosiew · 2026-05-14T06:18:08Z

run benchmark sql_planner

adriangbot · 2026-05-14T06:22:08Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4448157662-72-fnj6f 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing nullability-mismatch-22034 (aa7fb40) to 937dfda (merge-base) diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

kosiew · 2026-05-14T06:41:03Z

show benchmark queue

adriangbot · 2026-05-14T06:41:06Z

Hi @kosiew, you asked to view the benchmark queue (#22037 (comment)).

Comment	Repo	PR	User	Benchmarks	Status
#4448041895	apache/datafusion	#20381	adriangb	["clickbench_partitioned"]	running
#4448041895	apache/datafusion	#20381	adriangb	["tpcds"]	running
#4448145440	apache/arrow-rs	#9972	adriangb	["arrow_writer"]	running
#4448157662	apache/datafusion	#22037	kosiew	["sql_planner"]	running

File an issue against this benchmark runner

adriangbot · 2026-05-14T07:19:24Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                 main                                   nullability-mismatch-22034
-----                                                 ----                                   --------------------------
logical_aggregate_with_join                           1.00    416.0±1.43µs        ? ?/sec    1.01    418.8±2.83µs        ? ?/sec
logical_plan_struct_join_agg_sort                     1.00    160.8±1.66µs        ? ?/sec    1.04    167.2±0.78µs        ? ?/sec
logical_select_all_from_1000                          1.03      8.3±0.03ms        ? ?/sec    1.00      8.0±0.02ms        ? ?/sec
logical_select_one_from_700                           1.01    275.4±1.57µs        ? ?/sec    1.00    273.8±1.53µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.01    257.8±0.55µs        ? ?/sec    1.00    254.1±2.79µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.02    244.8±0.58µs        ? ?/sec    1.00    240.3±0.55µs        ? ?/sec
physical_intersection                                 1.00    587.5±2.18µs        ? ?/sec    1.01    592.9±1.63µs        ? ?/sec
physical_join_consider_sort                           1.00   1017.7±3.66µs        ? ?/sec    1.01   1030.2±3.45µs        ? ?/sec
physical_join_distinct                                1.02    239.4±4.44µs        ? ?/sec    1.00    235.5±0.52µs        ? ?/sec
physical_many_self_joins                              1.01      7.7±0.01ms        ? ?/sec    1.00      7.7±0.02ms        ? ?/sec
physical_plan_clickbench_all                          1.00    128.6±0.96ms        ? ?/sec    1.00    128.2±2.24ms        ? ?/sec
physical_plan_clickbench_q1                           1.00   1343.8±5.00µs        ? ?/sec    1.03   1378.6±5.71µs        ? ?/sec
physical_plan_clickbench_q10                          1.01      2.1±0.01ms        ? ?/sec    1.00      2.0±0.00ms        ? ?/sec
physical_plan_clickbench_q11                          1.01      2.2±0.04ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      2.3±0.01ms        ? ?/sec    1.01      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q13                          1.00      2.0±0.04ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q14                          1.00      2.2±0.00ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
physical_plan_clickbench_q15                          1.00      2.1±0.00ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q16                          1.00  1729.1±17.91µs        ? ?/sec    1.01   1749.9±5.06µs        ? ?/sec
physical_plan_clickbench_q17                          1.00   1779.9±6.55µs        ? ?/sec    1.01  1796.7±12.88µs        ? ?/sec
physical_plan_clickbench_q18                          1.00   1604.9±9.66µs        ? ?/sec    1.00   1612.3±5.24µs        ? ?/sec
physical_plan_clickbench_q19                          1.00   1984.2±6.15µs        ? ?/sec    1.00   1983.2±5.99µs        ? ?/sec
physical_plan_clickbench_q2                           1.00   1755.5±5.11µs        ? ?/sec    1.03  1800.8±34.04µs        ? ?/sec
physical_plan_clickbench_q20                          1.01  1521.8±17.10µs        ? ?/sec    1.00   1505.9±5.57µs        ? ?/sec
physical_plan_clickbench_q21                          1.01  1767.9±19.29µs        ? ?/sec    1.00   1747.0±4.73µs        ? ?/sec
physical_plan_clickbench_q22                          1.00      2.2±0.03ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_clickbench_q23                          1.02      2.4±0.01ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q24                          1.01      5.8±0.02ms        ? ?/sec    1.00      5.7±0.01ms        ? ?/sec
physical_plan_clickbench_q25                          1.03  1953.3±13.34µs        ? ?/sec    1.00   1899.7±5.70µs        ? ?/sec
physical_plan_clickbench_q26                          1.02   1747.4±9.38µs        ? ?/sec    1.00   1720.8±6.65µs        ? ?/sec
physical_plan_clickbench_q27                          1.02   1966.2±5.84µs        ? ?/sec    1.00   1922.6±8.51µs        ? ?/sec
physical_plan_clickbench_q28                          1.00      2.4±0.01ms        ? ?/sec    1.01      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q29                          1.00      2.6±0.01ms        ? ?/sec    1.02      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q3                           1.00   1614.5±7.03µs        ? ?/sec    1.02   1646.0±5.18µs        ? ?/sec
physical_plan_clickbench_q30                          1.00     16.2±0.09ms        ? ?/sec    1.00     16.3±0.03ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      2.4±0.00ms        ? ?/sec    1.02      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q32                          1.00      2.4±0.02ms        ? ?/sec    1.02      2.5±0.00ms        ? ?/sec
physical_plan_clickbench_q33                          1.00   1982.0±6.42µs        ? ?/sec    1.02      2.0±0.01ms        ? ?/sec
physical_plan_clickbench_q34                          1.00   1728.4±4.79µs        ? ?/sec    1.01   1753.5±4.45µs        ? ?/sec
physical_plan_clickbench_q35                          1.00  1796.6±25.27µs        ? ?/sec    1.02  1834.6±14.12µs        ? ?/sec
physical_plan_clickbench_q36                          1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q37                          1.00      2.6±0.03ms        ? ?/sec    1.01      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q38                          1.00      2.6±0.00ms        ? ?/sec    1.01      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q39                          1.00      2.7±0.04ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
physical_plan_clickbench_q4                           1.00   1417.0±5.96µs        ? ?/sec    1.02   1442.7±6.32µs        ? ?/sec
physical_plan_clickbench_q40                          1.00      3.5±0.06ms        ? ?/sec    1.00      3.5±0.01ms        ? ?/sec
physical_plan_clickbench_q41                          1.00      3.0±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_clickbench_q42                          1.00      3.1±0.01ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_clickbench_q43                          1.01      3.3±0.08ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_clickbench_q44                          1.00   1527.5±6.71µs        ? ?/sec    1.00  1529.8±22.03µs        ? ?/sec
physical_plan_clickbench_q45                          1.00   1531.1±4.74µs        ? ?/sec    1.01   1541.5±4.15µs        ? ?/sec
physical_plan_clickbench_q46                          1.01  1873.0±14.69µs        ? ?/sec    1.00   1857.3±4.74µs        ? ?/sec
physical_plan_clickbench_q47                          1.00      2.6±0.00ms        ? ?/sec    1.00      2.6±0.04ms        ? ?/sec
physical_plan_clickbench_q48                          1.01      2.9±0.03ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_clickbench_q49                          1.01      2.9±0.02ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
physical_plan_clickbench_q5                           1.00   1561.8±5.87µs        ? ?/sec    1.01   1582.5±4.51µs        ? ?/sec
physical_plan_clickbench_q50                          1.01      2.9±0.08ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q51                          1.02      2.0±0.01ms        ? ?/sec    1.00  1972.8±12.96µs        ? ?/sec
physical_plan_clickbench_q6                           1.03   1604.0±6.31µs        ? ?/sec    1.00   1550.3±5.09µs        ? ?/sec
physical_plan_clickbench_q7                           1.03   1636.5±6.31µs        ? ?/sec    1.00   1585.1±5.68µs        ? ?/sec
physical_plan_clickbench_q8                           1.04   1950.4±5.04µs        ? ?/sec    1.00   1880.3±6.35µs        ? ?/sec
physical_plan_clickbench_q9                           1.00   1920.7±5.31µs        ? ?/sec    1.00   1915.5±8.98µs        ? ?/sec
physical_plan_struct_join_agg_sort                    1.00   1383.6±2.62µs        ? ?/sec    1.00   1390.1±2.44µs        ? ?/sec
physical_plan_tpcds_all                               1.00    722.1±3.05ms        ? ?/sec    1.02    734.0±3.46ms        ? ?/sec
physical_plan_tpch_all                                1.00     47.5±0.26ms        ? ?/sec    1.02     48.6±0.05ms        ? ?/sec
physical_plan_tpch_q1                                 1.00   1564.1±3.08µs        ? ?/sec    1.02   1592.9±2.44µs        ? ?/sec
physical_plan_tpch_q10                                1.00      3.0±0.01ms        ? ?/sec    1.01      3.0±0.02ms        ? ?/sec
physical_plan_tpch_q11                                1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
physical_plan_tpch_q12                                1.00   1312.3±4.03µs        ? ?/sec    1.01   1319.3±4.11µs        ? ?/sec
physical_plan_tpch_q13                                1.00    997.1±5.86µs        ? ?/sec    1.00   1001.2±4.42µs        ? ?/sec
physical_plan_tpch_q14                                1.00   1424.6±3.56µs        ? ?/sec    1.02   1455.1±3.28µs        ? ?/sec
physical_plan_tpch_q16                                1.00   1662.6±3.84µs        ? ?/sec    1.00   1670.4±7.27µs        ? ?/sec
physical_plan_tpch_q17                                1.00   1781.8±5.22µs        ? ?/sec    1.02  1810.8±10.18µs        ? ?/sec
physical_plan_tpch_q18                                1.00   1887.9±5.00µs        ? ?/sec    1.01   1908.0±8.93µs        ? ?/sec
physical_plan_tpch_q19                                1.00      2.5±0.01ms        ? ?/sec    1.03      2.5±0.04ms        ? ?/sec
physical_plan_tpch_q2                                 1.00      4.3±0.00ms        ? ?/sec    1.01      4.4±0.00ms        ? ?/sec
physical_plan_tpch_q20                                1.00      2.4±0.00ms        ? ?/sec    1.01      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q21                                1.00      3.1±0.01ms        ? ?/sec    1.00      3.2±0.00ms        ? ?/sec
physical_plan_tpch_q22                                1.00   1532.3±2.73µs        ? ?/sec    1.01   1545.3±2.12µs        ? ?/sec
physical_plan_tpch_q3                                 1.00      2.0±0.01ms        ? ?/sec    1.02      2.0±0.00ms        ? ?/sec
physical_plan_tpch_q4                                 1.00   1171.6±3.30µs        ? ?/sec    1.01   1184.0±2.00µs        ? ?/sec
physical_plan_tpch_q5                                 1.00      2.6±0.02ms        ? ?/sec    1.02      2.7±0.04ms        ? ?/sec
physical_plan_tpch_q6                                 1.00    658.8±2.10µs        ? ?/sec    1.03    680.3±1.98µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      3.2±0.03ms        ? ?/sec    1.01      3.2±0.02ms        ? ?/sec
physical_plan_tpch_q8                                 1.00      4.2±0.01ms        ? ?/sec    1.01      4.2±0.05ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      2.9±0.00ms        ? ?/sec    1.01      2.9±0.00ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     14.7±0.02ms        ? ?/sec    1.00     14.7±0.04ms        ? ?/sec
physical_select_all_from_1000                         1.02     18.0±0.05ms        ? ?/sec    1.00     17.5±0.04ms        ? ?/sec
physical_select_one_from_700                          1.01    722.8±1.74µs        ? ?/sec    1.00    718.4±2.55µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.00      4.8±0.01ms        ? ?/sec    1.00      4.8±0.01ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.00     11.6±0.01ms        ? ?/sec    1.00     11.6±0.03ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    113.1±0.27ms        ? ?/sec    1.00    113.4±0.42ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00    603.3±2.67ms        ? ?/sec    1.01    607.3±2.79ms        ? ?/sec
physical_theta_join_consider_sort                     1.00   1052.7±3.43µs        ? ?/sec    1.01   1064.5±2.64µs        ? ?/sec
physical_unnest_to_join                               1.00    616.0±1.68µs        ? ?/sec    1.02    627.3±1.70µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00    742.5±2.35µs        ? ?/sec    1.02    755.0±2.67µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00   1490.4±3.60µs        ? ?/sec    1.01   1499.8±2.64µs        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    439.8±1.22µs        ? ?/sec    1.05    460.4±1.28µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    551.4±2.09µs        ? ?/sec    1.03    566.7±1.19µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    593.8±1.91µs        ? ?/sec    1.02    606.9±1.39µs        ? ?/sec
with_param_values_many_columns                        1.01    461.3±2.71µs        ? ?/sec    1.00    456.2±2.10µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	1240.3s
Peak memory	19.7 GiB
Avg memory	19.7 GiB
CPU user	1479.6s
CPU sys	1.6s
Peak spill	0 B

branch

Metric	Value
Wall time	1245.3s
Peak memory	19.8 GiB
Avg memory	19.7 GiB
CPU user	1485.2s
CPU sys	1.1s
Peak spill	0 B

File an issue against this benchmark runner

kosiew added 5 commits May 6, 2026 11:49

github-actions Bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels May 6, 2026

kosiew marked this pull request as ready for review May 6, 2026 06:17

kosiew added 6 commits May 10, 2026 23:37

kosiew changed the title ~~Preserve recursive CTE static schema with plan-time schema alignment~~ Widen recursive CTE logical schema nullability May 11, 2026

github-actions Bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate proto Related to proto crate auto detected api change Auto detected API change labels May 11, 2026

kosiew added 4 commits May 11, 2026 10:17

Revert to 63f62a8: feat: enhance recursive query validation and testing

365b9ca

feat: enhance recursive query handling by aligning schemas and preser…

973f93e

…ving metadata

github-actions Bot removed core Core DataFusion crate auto detected api change Auto detected API change labels May 11, 2026

neilconway suggested changes May 11, 2026

View reviewed changes

kosiew marked this pull request as draft May 13, 2026 09:18

kosiew added 10 commits May 13, 2026 21:32

Revert to 739e147: Add reusable plan-time schema alignment helper and…

59e8d92

… apply to RecursiveQueryExec (apache#21912)

Merge branch 'main' into nullability-mismatch-22034

1c990d7

fix: update explain_tree.slt to reflect correct type casting in proje…

3389785

…ction

fix: correct SUM(0) -> 0 as level in recursive CTE query

6953076

fix: update type casting in projection for explain_analyze test

18b06b0

- Changed projection expressions in the `parquet_recursive_projection_pushdown` test to use `CAST` for consistency and improved type safety.

feat: update TreeNode::exists usage and optimize CTE handling

aa7fb40

- Refactored TreeNode::exists in physical planner and CTE modules - Removed redundant recursive CTE re-coercion in logical plan builder - Inlined small one-use variables in recursive schema module

github-actions Bot added core Core DataFusion crate common Related to common crate and removed physical-plan Changes to the physical-plan crate labels May 13, 2026

kosiew changed the title ~~Widen recursive CTE logical schema nullability~~ Preserve recursive CTE nullability across logical and physical planning May 13, 2026

kosiew marked this pull request as ready for review May 13, 2026 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve recursive CTE nullability across logical and physical planning#22037

Preserve recursive CTE nullability across logical and physical planning#22037
kosiew wants to merge 25 commits into
apache:mainfrom
kosiew:nullability-mismatch-22034

kosiew commented May 6, 2026 •

edited

Loading

Uh oh!

neilconway commented May 9, 2026 •

edited

Loading

Uh oh!

kosiew commented May 11, 2026

Uh oh!

neilconway left a comment

Uh oh!

kosiew commented May 13, 2026 •

edited

Loading

Uh oh!

kosiew commented May 14, 2026

Uh oh!

adriangbot commented May 14, 2026

Uh oh!

kosiew commented May 14, 2026

Uh oh!

adriangbot commented May 14, 2026

Uh oh!

adriangbot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kosiew commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

LLM-generated code disclosure

Uh oh!

neilconway commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosiew commented May 11, 2026

Uh oh!

neilconway left a comment

Choose a reason for hiding this comment

Uh oh!

kosiew commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosiew commented May 14, 2026

Uh oh!

adriangbot commented May 14, 2026

Uh oh!

kosiew commented May 14, 2026

Uh oh!

adriangbot commented May 14, 2026

Uh oh!

adriangbot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kosiew commented May 6, 2026 •

edited

Loading

neilconway commented May 9, 2026 •

edited

Loading

kosiew commented May 13, 2026 •

edited

Loading