@wenym1 The key is the bottom BatchScan, there is a limit: 1 so that the entire SQL only needs to scan one row.
Oh, I got your point. Previously the TopNOnIndexRule does not apply due to the issue in the PR description, and the BatchScan on index is generated later in the simple IndexSelectionRule. The difference is that the BatchTopN generated by TopNOnIndexRule has a prefix that , so that its limit can be later pushed down to the BatchScan.
Theoretical, if we can have a new optimizer rule, we can push down the limit in BatchTopN to the BatchScan, when seeing that (eq scan range in BatchScan + order of BatchTopN) is the prefix of (pk of the table of BatchScan), such as for the plan below
...
└─BatchTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
└─BatchScan { table: t_index_ab, columns: [a, b, c], scan_ranges: [a = Int32(10)] }
It is exactly what TopNOnIndexRule has done.
I see. But I think the optimizer is not generating the best query plan. Instead of
BatchSort { order: [t_index_ab.b ASC] }
└─BatchTopN { order: [t_index_ab.a DESC, t_index_ab.b ASC], limit: 1, offset: 0 }
└─BatchExchange { order: [], dist: Single }
└─BatchLimit { limit: 1, offset: 0 }
└─BatchScan { table: t_index_ab, columns: [a, b, c], scan_ranges: [a = Int32(10)], limit: 1 }
a better one would be like
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
└─BatchExchange { order: [t_index_ab.b], dist: Single }
└─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)], limit: 1 }
The optimization trace (enabled by running explain(trace) <QUERY>) shall be like
0. starting from
LogicalTopN { order: [t.c ASC], limit: 1, offset: 0 }
└─LogicalScan { table: t, columns: [t.a, t.b, t.c], predicate: (t.a = 10:Int32) }
- apply TopNOnIndexRule
LogicalProject { exprs: [10:Int32 as a, t_index_ab.b, t_index_ab.c] }
└─LogicalTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
└─LogicalScan { table: t_index_ab, columns: [t_index_ab.b, t_index_ab.c], predicate: (t_index_ab.a = 10:Int32) }
the key difference is that
- we don't include the
a from the output of LogicalScan, and instead we set a as a const value via a later LogicalProject.
- the
order of LogicalTopN includes only b, instead of [a, b]. The current [a, b] has prevented us from doing many later optimization.
- turn logical plan into a same batch plan
BatchProject { exprs: [10:Int32 as a, t_index_ab.b, t_index_ab.c] }
└─BatchTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
└─BatchScan { table: t_index_ab, columns: [t_index_ab.b, t_index_ab.c], predicate: (t_index_ab.a = 10:Int32) }
the upper most BatchSort is not needed, because the BatchTopN is already ordered by b.
3. turn into physical plan for each plan node
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
└─BatchExchange { order: [t_index_ab.b], dist: Single }
└─BatchLimit { limit: 1, offset: 0 }
└─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)]}
the key point here is that
- for
BatchScan, when some of the eq predicates is the prefix (a = Int32(10)) of table pk (a, b), and the prefix of output columns matches the subsequent columns (b) after the eq predicates in pk, the BatchScan should have order as the prefix of output column (b)
- when turning
BatchTopN into two phase physical plan, when seeing the order of input matches the order of itself, not only turning the first phase as BatchLimit, but also the second phase.
- push down limit to
BatchScan
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
└─BatchExchange { order: [t_index_ab.b], dist: Single }
└─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)], limit: 1 }
I will merge this PR first. This refinement can be done in a separate PR. I will create an issue on it.
Originally posted by @wenym1 in #22561 (comment)
I see. But I think the optimizer is not generating the best query plan. Instead of
a better one would be like
The optimization trace (enabled by running
explain(trace) <QUERY>) shall be like0. starting from
the key difference is that
afrom the output ofLogicalScan, and instead we setaas a const value via a laterLogicalProject.orderofLogicalTopNincludes onlyb, instead of[a, b]. The current[a, b]has prevented us from doing many later optimization.the upper most
BatchSortis not needed, because theBatchTopNis already ordered byb.3. turn into physical plan for each plan node
the key point here is that
BatchScan, when some of the eq predicates is the prefix (a = Int32(10)) of table pk (a, b), and the prefix of output columns matches the subsequent columns (b) after the eq predicates in pk, theBatchScanshould have order as the prefix of output column (b)BatchTopNinto two phase physical plan, when seeing the order of input matches the order of itself, not only turning the first phase asBatchLimit, but also the second phase.BatchScanI will merge this PR first. This refinement can be done in a separate PR. I will create an issue on it.
Originally posted by @wenym1 in #22561 (comment)