Skip to content

improving optimizer TopNOnIndexRule #22570

@wenym1

Description

@wenym1

@wenym1 The key is the bottom BatchScan, there is a limit: 1 so that the entire SQL only needs to scan one row.

Oh, I got your point. Previously the TopNOnIndexRule does not apply due to the issue in the PR description, and the BatchScan on index is generated later in the simple IndexSelectionRule. The difference is that the BatchTopN generated by TopNOnIndexRule has a prefix that , so that its limit can be later pushed down to the BatchScan.
Theoretical, if we can have a new optimizer rule, we can push down the limit in BatchTopN to the BatchScan, when seeing that (eq scan range in BatchScan + order of BatchTopN) is the prefix of (pk of the table of BatchScan), such as for the plan below

...
   └─BatchTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
     └─BatchScan { table: t_index_ab, columns: [a, b, c], scan_ranges: [a = Int32(10)] }

It is exactly what TopNOnIndexRule has done.

I see. But I think the optimizer is not generating the best query plan. Instead of

BatchSort { order: [t_index_ab.b ASC] }
 └─BatchTopN { order: [t_index_ab.a DESC, t_index_ab.b ASC], limit: 1, offset: 0 }
   └─BatchExchange { order: [], dist: Single }
     └─BatchLimit { limit: 1, offset: 0 }
       └─BatchScan { table: t_index_ab, columns: [a, b, c], scan_ranges: [a = Int32(10)], limit: 1 }

a better one would be like

BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
   └─BatchExchange { order: [t_index_ab.b], dist: Single }
     └─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)], limit: 1 }

The optimization trace (enabled by running explain(trace) <QUERY>) shall be like
0. starting from

LogicalTopN { order: [t.c ASC], limit: 1, offset: 0 }
 └─LogicalScan { table: t, columns: [t.a, t.b, t.c], predicate: (t.a = 10:Int32) }
  1. apply TopNOnIndexRule
LogicalProject { exprs: [10:Int32 as a, t_index_ab.b, t_index_ab.c] }
  └─LogicalTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
     └─LogicalScan { table: t_index_ab, columns: [t_index_ab.b, t_index_ab.c], predicate: (t_index_ab.a = 10:Int32) }

the key difference is that

  • we don't include the a from the output of LogicalScan, and instead we set a as a const value via a later LogicalProject.
  • the order of LogicalTopN includes only b, instead of [a, b]. The current [a, b] has prevented us from doing many later optimization.
  1. turn logical plan into a same batch plan
BatchProject { exprs: [10:Int32 as a, t_index_ab.b, t_index_ab.c] }
  └─BatchTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
     └─BatchScan { table: t_index_ab, columns: [t_index_ab.b, t_index_ab.c], predicate: (t_index_ab.a = 10:Int32) }

the upper most BatchSort is not needed, because the BatchTopN is already ordered by b.
3. turn into physical plan for each plan node

BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
   └─BatchExchange { order: [t_index_ab.b], dist: Single }
     └─BatchLimit { limit: 1, offset: 0 }
       └─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)]}

the key point here is that

  • for BatchScan, when some of the eq predicates is the prefix (a = Int32(10)) of table pk (a, b), and the prefix of output columns matches the subsequent columns (b) after the eq predicates in pk, the BatchScan should have order as the prefix of output column (b)
  • when turning BatchTopN into two phase physical plan, when seeing the order of input matches the order of itself, not only turning the first phase as BatchLimit, but also the second phase.
  1. push down limit to BatchScan
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
   └─BatchExchange { order: [t_index_ab.b], dist: Single }
     └─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)], limit: 1 }

I will merge this PR first. This refinement can be done in a separate PR. I will create an issue on it.

Originally posted by @wenym1 in #22561 (comment)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions