MDEV-39499 Updates to derived-with-keys, window functions determining by mariadb-RexJohnston · Pull Request #5252 · MariaDB/server

mariadb-RexJohnston · 2026-06-17T21:19:29Z

..records per key

Enabling derived keys optimization for derived.col = const pushed conditions.

Estimating records per key in derived key for the optimizer based on form and/or size of components of a derived table.

Consider any derived table of the form

SELECT ..., ROW_NUMBER () OVER (PARTITION BY c1,c2 order by ...) FROM t1, t2, t3 ...
WHERE ...

If the optimizer generates a key on this derived table because of a constraint being pushed into it, it currently will not consider key components of the form col = const

We lift this constraint and add code to TABLE::add_tmp_key to search for a window function ROW_NUMBER(). From the partition list c1, c2 we can in infer an estimate of the number of rows we expect to see for each key value. In the absence of EITS, we still have an number of expected records in our referred to table and will fall back to using that as a worst case scenario.

As a consequence of forcing best_access_path to now consider a generated index's records per key on any type of derived table, we see a number of optimization changes, for example a range optimizer might now be considered best and might return IMPOSSIBLE_RANGE (seen in rowid_filter_myisam.result).

gemini-code-assist

Code Review

This pull request introduces cardinality estimation for derived tables with window functions (specifically ROW_NUMBER) by adding opt_window_function_cardinality.cc and integrating it into temporary key creation. Feedback on these changes highlights critical compilation errors due to type mismatches between ulong* and rec_per_key_t* (which is double*) in the new estimation function. Additionally, several high-severity issues were identified, including potential null pointer dereferences when resolving fields and a potential buffer overflow in TABLE::add_tmp_key if key_parts exceeds MAX_REF_PARTS.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

..records per key Enabling derived keys optimization for derived.col = const pushed conditions. Estimating records per key in derived key for the optimizer based on form and/or size of components of a derived table. Consider any derived table of the form SELECT ..., ROW_NUMBER () OVER (PARTITION BY c1,c2 order by ...) FROM t1, t2, t3 ... WHERE ... If the optimizer generates a key on this derived table because of a constraint being pushed into it, it currently will not consider key components of the form col = const We lift this constraint and add code to TABLE::add_tmp_key to search for a window function ROW_NUMBER(). From the partition list c1, c2 we can in infer an estimate of the number of rows we expect to see for each key value. In the absence of EITS, we still have an number of expected records in our referred to table and will fall back to using that as a worst case scenario. As a consequence of forcing best_access_path to now consider a generated index's records per key on any type of derived table, we see a number of optimization changes, for example a range optimizer might now be considered best and might return IMPOSSIBLE_RANGE (seen in rowid_filter_myisam.result).

mariadb-RexJohnston · 2026-06-19T02:53:05Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a mechanism to estimate the cardinality of derived tables with window functions (specifically ROW_NUMBER() OVER (PARTITION BY ...)), allowing the optimizer to better determine records per key. It adds opt_window_function_cardinality.cc and opt_window_function_cardinality.h, integrates them into the build files and TABLE::add_tmp_key, and includes extensive test coverage. The review feedback highlights several critical improvements: 1) In handle_single_part_rownumber, HA_POS_ERROR must be handled safely to prevent massive values from casting, and multi-column partition estimates should be capped by the table's total records. 2) In est_derived_window_fn_cardinality, comparing result_field directly with the key field is a much more robust heuristic than using select list indices, and rec_per_key_t should be used instead of ulong for type safety. 3) A defensive assertion should be added in TABLE::add_tmp_key to prevent potential stack buffer overflows.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-19T02:55:54Z

+static ha_rows handle_single_part_rownumber( Item_window_func *item_row_number)
+{
+  if (item_row_number->window_spec->partition_list &&
+      item_row_number->window_spec->partition_list->elements)
+  {
+    double records= 1;
+    ORDER *part_list=  item_row_number->window_spec->partition_list->first;
+    for (; part_list; part_list= part_list->next)
+    {
+      if ((*part_list->item)->type() == Item::FIELD_ITEM)
+      {
+        Item_field *item_field= ((Item_field*)(*part_list->item));
+        DBUG_ASSERT(item_field);
+        Field *field= item_field->field;
+        DBUG_ASSERT(field);
+        /*
+          use EITS records per key if we have them,
+          otherwise
+          use number of records in the table (as a worst case scenario)
+        */
+        if (field->read_stats)
+        {
+          //  records*= field->read_stats->get_avg_frequency();
+          double freq= field->read_stats->get_avg_frequency();
+          records*= freq > 0 ? field->table->used_stat_records / freq
+                     : (double)field->table->used_stat_records;
+        }
+        else
+          records*= (double)field->table->used_stat_records;
+      }
+    }
+    return records < 1.0 ? (ha_rows)1 : (ha_rows)records;
+  }
+  else
+  {
+    return (ha_rows)1;
+  }
+}


The current implementation of handle_single_part_rownumber has two significant issues:

Handling of HA_POS_ERROR: If table->used_stat_records is HA_POS_ERROR (indicating unknown/uninitialized statistics), casting it to double will result in a huge value (approx 1.8e19), which will completely distort the cardinality estimation and potentially cause overflows.

Exaggerated Estimates for Multi-column Partitions: When partitioning by multiple columns of the same table, multiplying their distinct values can easily exceed the total number of records in the table. The accumulated distinct values for each table should be capped by that table's total records.

We can resolve both issues by tracking distinct values per table and capping them, while safely falling back to transient engine statistics or a default value if used_stat_records is HA_POS_ERROR.

static ha_rows handle_single_part_rownumber(Item_window_func *item_row_number) { if (item_row_number->window_spec->partition_list && item_row_number->window_spec->partition_list->elements) { double records= 1; const uint MAX_TABLES= 16; TABLE *tables[MAX_TABLES]; double table_distinct[MAX_TABLES]; uint table_count= 0; ORDER *part_list= item_row_number->window_spec->partition_list->first; for (; part_list; part_list= part_list->next) { if ((*part_list->item)->type() == Item::FIELD_ITEM) { Item_field *item_field= ((Item_field*)(*part_list->item)); DBUG_ASSERT(item_field); Field *field= item_field->field; DBUG_ASSERT(field); TABLE *table= field->table; ha_rows table_records= table->used_stat_records; if (table_records == HA_POS_ERROR) { table_records= table->file->stats.records; if (table_records == HA_POS_ERROR) table_records= 1000; // Fallback default } double diff_values= 1.0; if (field->read_stats) { double freq= field->read_stats->get_avg_frequency(); diff_values= freq > 0 ? (double)table_records / freq : (double)table_records; } else { diff_values= (double)table_records; } uint idx= 0; for (; idx < table_count; idx++) { if (tables[idx] == table) break; } if (idx < MAX_TABLES) { if (idx == table_count) { tables[idx]= table; table_distinct[idx]= diff_values; table_count++; } else { table_distinct[idx]*= diff_values; } if (table_distinct[idx] > (double)table_records) table_distinct[idx]= (double)table_records; } } } for (uint i= 0; i < table_count; i++) { records*= table_distinct[i]; } return records < 1.0 ? (ha_rows)1 : (ha_rows)records; } else { return (ha_rows)1; } }

gemini-code-assist · 2026-06-19T02:55:54Z

+bool est_derived_window_fn_cardinality(st_select_lex* derived,
+                                       ulong *out_records,
+                                       Field **reg_fields,
+                                       uint key_parts)
+{
+  uint fc= 0;
+
+  while (fc < key_parts)
+  {
+    uint ic= 0;
+    if (!reg_fields[fc])
+    {
+      fc++;
+      continue;
+    }
+    uint field_index= reg_fields[fc]->field_index;
+    List_iterator_fast<Item> li(*derived->get_item_list());
+    Item *item;
+
+    // find our window function in the item list
+    while ((item= li++))
+    {
+      // if the single part key is on our window function
+      if ((ic == field_index) && item->type() == Item::WINDOW_FUNC_ITEM)
+      {
+        Item_window_func *itm_row_num= (Item_window_func*)item;
+
+        if (itm_row_num->window_func()->sum_func() == Item_sum::ROW_NUMBER_FUNC)
+        {
+          *out_records= (ulong)handle_single_part_rownumber(itm_row_num);
+          return true;
+        }
+      }
+      ic++;
+    }
+    fc++;
+  }
+
+  return false;
+}


The current implementation of est_derived_window_fn_cardinality relies on comparing the select list index ic with the temporary table's field_index. This index-based heuristic is fragile and can break if fields are optimized away, reordered, or if hidden fields are present in the temporary table.

Instead, we can directly compare the result_field of the Item with the key field reg_fields[fc]. This is 100% robust and standard in the optimizer. Additionally, we should use rec_per_key_t instead of ulong for out_records to ensure type safety and future-proofing.

bool est_derived_window_fn_cardinality(st_select_lex* derived, rec_per_key_t *out_records, Field **reg_fields, uint key_parts) { uint fc= 0; while (fc < key_parts) { if (!reg_fields[fc]) { fc++; continue; } List_iterator_fast<Item> li(*derived->get_item_list()); Item *item; // find our window function in the item list while ((item= li++)) { // Compare the result_field of the item directly with the key field if (item->result_field == reg_fields[fc] && item->type() == Item::WINDOW_FUNC_ITEM) { Item_window_func *itm_row_num= (Item_window_func*)item; if (itm_row_num->window_func()->sum_func() == Item_sum::ROW_NUMBER_FUNC) { *out_records= (rec_per_key_t)handle_single_part_rownumber(itm_row_num); return true; } } } fc++; } return false; }

gemini-code-assist · 2026-06-19T02:55:54Z

+bool  est_derived_window_fn_cardinality(st_select_lex* derived,
+                                        ulong *out_records,
+                                        Field **reg_fields,
+                                        uint key_parts);


Update the function signature to use rec_per_key_t instead of ulong for the out_records parameter to maintain type safety and consistency with the optimizer's rec_per_key array.

Suggested change

bool est_derived_window_fn_cardinality(st_select_lex* derived,

ulong *out_records,

Field **reg_fields,

uint key_parts);

bool est_derived_window_fn_cardinality(st_select_lex* derived,

rec_per_key_t *out_records,

Field **reg_fields,

uint key_parts);

gemini-code-assist · 2026-06-19T02:55:55Z

  char buf[NAME_CHAR_LEN];
  KEY *keyinfo= key_info + key;
  KEY_PART_INFO *key_part_info;
-  Field **reg_field;
+  Field *reg_field[MAX_REF_PARTS];
  uint i;
  bool key_start= TRUE;


Since reg_field is now declared as a fixed-size stack array of size MAX_REF_PARTS, we should add a defensive assertion DBUG_ASSERT(key_parts <= MAX_REF_PARTS); at the beginning of TABLE::add_tmp_key to prevent any potential stack buffer overflow if key_parts exceeds MAX_REF_PARTS.

char buf[NAME_CHAR_LEN]; KEY *keyinfo= key_info + key; KEY_PART_INFO *key_part_info; Field *reg_field[MAX_REF_PARTS]; uint i; bool key_start= TRUE; DBUG_ASSERT(key_parts <= MAX_REF_PARTS);

gemini-code-assist Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread sql/opt_window_function_cardinality.h

Comment thread sql/opt_window_function_cardinality.cc Outdated

Comment thread sql/table.cc

Comment thread sql/opt_window_function_cardinality.cc

mariadb-RexJohnston mentioned this pull request Jun 17, 2026

MDEV-39499 Updates to derived-with-keys, window functions determining records per key #5020

Closed

gkodinov added the MariaDB Corporation label Jun 18, 2026

MariaDB deleted a comment from gemini-code-assist Bot Jun 18, 2026

mariadb-RexJohnston force-pushed the 11.4-MDEV-39499 branch from c4c2f7d to f824984 Compare June 18, 2026 22:05

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MDEV-39499 Updates to derived-with-keys, window functions determining#5252

MDEV-39499 Updates to derived-with-keys, window functions determining#5252
mariadb-RexJohnston wants to merge 1 commit into
11.4from
11.4-MDEV-39499

mariadb-RexJohnston commented Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mariadb-RexJohnston commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mariadb-RexJohnston commented Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mariadb-RexJohnston commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants