HIVE-29614: Fix incorrect column lineage for multiple window functions with identical partition keys#6488
HIVE-29614: Fix incorrect column lineage for multiple window functions with identical partition keys#6488ljq-dmr wants to merge 1 commit into
Conversation
…s with identical partition keys
|
There was a problem hiding this comment.
Pull request overview
This PR updates Hive’s lineage propagation for PTF/windowing operators to correctly account for multiple window functions (instead of only the first), and refreshes LLAP qtest golden files accordingly to reflect the corrected lineage dependencies.
Changes:
- Update PTF lineage derivation to iterate across all window functions when building dependency information.
- Add a focused lineage qtest (
lineage7.q) that exercises multiple window functions with identical partition keys. - Refresh existing LLAP windowing golden outputs to include the newly-correct lineage dependencies.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java | Updates PTF/windowing lineage dependency construction to handle multiple window functions. |
| ql/src/test/queries/clientpositive/lineage7.q | Adds a new qtest query to reproduce/cover multi-window-function lineage behavior. |
| ql/src/test/results/clientpositive/llap/lineage7.q.out | Adds the expected LLAP output for the new lineage qtest. |
| ql/src/test/results/clientpositive/llap/windowing.q.out | Updates expected lineage output for non-vector LLAP windowing tests. |
| ql/src/test/results/clientpositive/llap/vector_windowing.q.out | Updates expected lineage output for vectorized LLAP windowing tests. |
| ql/src/test/results/clientpositive/llap/cbo_rp_windowing_2.q.out | Updates expected lineage output for CBO regression windowing tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for(ColumnInfo ci : op.getSchema().getSignature()) { | ||
| Dependency d = dep; | ||
| Dependency depCi = lCtx.getIndex().getDependency(inpOp, ci); | ||
| if (depCi != null) { | ||
| d = depCi; | ||
| } | ||
| Dependency d = null; | ||
| Dependency depCi = lCtx.getIndex().getDependency(inpOp, ci); | ||
| if (depCi != null) { | ||
| d = depCi; |
There was a problem hiding this comment.
PTFTranslator.buildRowResolverForWindowing is actually unrelated to the current getDependency(inpOp, ci) call. This is because the operation retrieves data directly from the depMap within LineageCtx.
The else "fallback" logic you mentioned is specific to the matchpath syntax (as seen in ql/src/test/queries/clientpositive/ptf_matchpath.q). However, I believe the resulting lineage in that case is correct and intended, as it accurately reflects the dependencies for columns generated within the matchpath function.



What changes were proposed in this pull request?
Iterate over if there are multiple functions instead of getWindowFunctions().getFirst()
Why are the changes needed?
PTFOperator may contain multiple functions
Does this PR introduce any user-facing change?
No
How was this patch tested?
mvn test -Pitests -pl itests/qtest -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cbo_rp_windowing_2.q,ptf.q,ptf_matchpath.q,vector_windowing.q,vectorized_ptf.q,windowing.q -Dtest.output.overwrite=true