improving optimizer TopNOnIndexRule

> > > @wenym1 The key is the bottom `BatchScan`, there is a `limit: 1` so that the entire SQL only needs to scan one row.
> > 
> > 
> > Oh, I got your point. Previously the `TopNOnIndexRule` does not apply due to the issue in the PR description, and the `BatchScan` on index is generated later in the simple `IndexSelectionRule`. The difference is that the `BatchTopN` generated by `TopNOnIndexRule` has a prefix that , so that its limit can be later pushed down to the `BatchScan`.
> > Theoretical, if we can have a new optimizer rule, we can push down the limit in `BatchTopN` to the `BatchScan`, when seeing that (eq scan range in `BatchScan` + order of `BatchTopN`) is the prefix of (pk of the table of `BatchScan`), such as for the plan below
> > ```
> > ...
> >    └─BatchTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
> >      └─BatchScan { table: t_index_ab, columns: [a, b, c], scan_ranges: [a = Int32(10)] }
> > ```
> 
> It is exactly what TopNOnIndexRule has done.

I see. But I think the optimizer is not generating the best query plan. Instead of 
```
BatchSort { order: [t_index_ab.b ASC] }
 └─BatchTopN { order: [t_index_ab.a DESC, t_index_ab.b ASC], limit: 1, offset: 0 }
   └─BatchExchange { order: [], dist: Single }
     └─BatchLimit { limit: 1, offset: 0 }
       └─BatchScan { table: t_index_ab, columns: [a, b, c], scan_ranges: [a = Int32(10)], limit: 1 }
```
a better one would be like
```
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
   └─BatchExchange { order: [t_index_ab.b], dist: Single }
     └─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)], limit: 1 }
```

The optimization trace (enabled by running `explain(trace) <QUERY>`) shall be like
0. starting from
```
LogicalTopN { order: [t.c ASC], limit: 1, offset: 0 }
 └─LogicalScan { table: t, columns: [t.a, t.b, t.c], predicate: (t.a = 10:Int32) }
```
1. apply TopNOnIndexRule
```
LogicalProject { exprs: [10:Int32 as a, t_index_ab.b, t_index_ab.c] }
  └─LogicalTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
     └─LogicalScan { table: t_index_ab, columns: [t_index_ab.b, t_index_ab.c], predicate: (t_index_ab.a = 10:Int32) }
```
the key difference is that
* we don't include the `a` from the output of `LogicalScan`, and instead we set `a` as a const value via a later `LogicalProject`.
* the `order` of `LogicalTopN` includes only `b`, instead of `[a, b]`. The current `[a, b]` has prevented us from doing many later optimization.
2. turn logical plan into a same batch plan
```
BatchProject { exprs: [10:Int32 as a, t_index_ab.b, t_index_ab.c] }
  └─BatchTopN { order: [t_index_ab.b ASC], limit: 1, offset: 0 }
     └─BatchScan { table: t_index_ab, columns: [t_index_ab.b, t_index_ab.c], predicate: (t_index_ab.a = 10:Int32) }
```
the upper most `BatchSort` is not needed, because the `BatchTopN` is already ordered by `b`.
3. turn into physical plan for each plan node
```
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
   └─BatchExchange { order: [t_index_ab.b], dist: Single }
     └─BatchLimit { limit: 1, offset: 0 }
       └─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)]}
```
the key point here is that
* for `BatchScan`, when some of the eq predicates is the prefix (a = Int32(10)) of table pk (a, b), and the prefix of output columns matches the subsequent columns (b) after the eq predicates in pk, the `BatchScan` should have order as the prefix of output column (b)
* when turning `BatchTopN` into two phase physical plan, when seeing the order of input matches the order of itself, not only turning the first phase as `BatchLimit`, but also the second phase.
4. push down limit to `BatchScan`
```
BatchProject { exprs: [Int32(10) as a, t_index_ab.b, t_index_ab.c] }
└─BatchLimit { limit: 1, offset: 0 }
   └─BatchExchange { order: [t_index_ab.b], dist: Single }
     └─BatchScan { table: t_index_ab, columns: [b, c], scan_ranges: [a = Int32(10)], limit: 1 }
```

I will merge this PR first. This refinement can be done in a separate PR. I will create an issue on it.

_Originally posted by @wenym1 in https://github.com/risingwavelabs/risingwave/issues/22561#issuecomment-3060840674_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improving optimizer TopNOnIndexRule #22570

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

improving optimizer TopNOnIndexRule #22570

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions