Skip to content

[GH-3089] Fix empty-envelope false candidate and RS_DWithin EXPLAIN label in raster distance join#3090

Draft
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:fix/rs-dwithin-raster-disposal
Draft

[GH-3089] Fix empty-envelope false candidate and RS_DWithin EXPLAIN label in raster distance join#3090
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:fix/rs-dwithin-raster-disposal

Conversation

@jiayuasu

@jiayuasu jiayuasu commented Jun 30, 2026

Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Two issues in the raster distance-join machinery built around RS_DWithin, reported in #3089. Both changes are confined to spark/common.

  1. Empty-envelope false candidate. When a raster or geometry input is NULL, the join substitutes an empty GeometryCollection. expandRasterFilterEnvelope then expanded that geometry's degenerate envelope by distance, producing a non-empty R-tree filter envelope that pulls the row in as a coarse-filter candidate (and can produce a NaN-bounded envelope from the null Envelope). It now returns the empty shape unchanged so the coarse filter excludes it.

  2. Misleading EXPLAIN output. The raster distance branch of BroadcastIndexJoinExec.simpleString printed RS_Distance(left, right) < r — a function that does not exist in Sedona. It now prints RS_DWithin(left, right, r), naming the actual predicate that drives the join.

How was this patch tested?

sedona-spark-common compiles cleanly with the changes. Both are minimal changes within existing code paths exercised by RasterJoinSuite and BroadcastIndexJoinSuite.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

@jiayuasu jiayuasu marked this pull request as draft June 30, 2026 21:32
…LAIN label in raster distance join

Two issues in the raster distance-join machinery built around RS_DWithin
(apacheGH-3089), both confined to spark/common:

1. Empty-envelope false candidate. When a raster or geometry input is NULL, the
   join substitutes an empty GeometryCollection. expandRasterFilterEnvelope then
   expanded that geometry's degenerate envelope by `distance`, producing a
   non-empty R-tree filter envelope that pulls the row in as a coarse-filter
   candidate (and can yield a NaN-bounded envelope). Return the empty shape
   unchanged so the filter excludes it.

2. Misleading EXPLAIN output. The raster distance branch of
   BroadcastIndexJoinExec.simpleString printed "RS_Distance(left, right) < r" —
   a function that does not exist in Sedona. Print "RS_DWithin(left, right, r)"
   to match the actual join predicate.
@jiayuasu jiayuasu force-pushed the fix/rs-dwithin-raster-disposal branch from 5e0ee4c to be8ad38 Compare July 1, 2026 06:39
@jiayuasu jiayuasu changed the title [GH-3089] Dispose rasters in RS_DWithin join path and fix empty-envelope filter [GH-3089] Fix empty-envelope false candidate and RS_DWithin EXPLAIN label in raster distance join Jul 1, 2026
@jiayuasu jiayuasu requested a review from Copilot July 1, 2026 06:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses two correctness/diagnostic issues in Sedona’s raster distance-join implementation (RS_DWithin) within spark/common: preventing NULL inputs (represented as empty shapes) from creating spurious coarse-filter candidates, and correcting the physical plan EXPLAIN label to reflect the actual raster predicate.

Changes:

  • Prevent expansion of empty raster/geometry envelopes in expandRasterFilterEnvelope, avoiding false R-tree candidates for NULL inputs.
  • Update BroadcastIndexJoinExec.simpleString to display RS_DWithin(left, right, r) instead of a non-existent RS_Distance(...) < r predicate.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryBase.scala Adds an early return for empty shapes to keep NULL-substituted geometries from expanding into non-empty coarse filter envelopes.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/BroadcastIndexJoinExec.scala Fixes raster distance-join EXPLAIN/simpleString rendering to show RS_DWithin(...) (the actual predicate).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +191 to +196
// An empty shape (e.g. the empty GeometryCollection substituted for a NULL raster/geometry)
// must stay empty: expanding its degenerate envelope by `distance` would yield a non-empty
// filter geometry that spuriously matches rows the predicate should never join.
if (baseShape.isEmpty) {
return baseShape
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raster distance join: NULL inputs produce spurious R-tree candidates, plus misleading RS_Distance EXPLAIN label

2 participants