feat: support interval types and make_ym_interval / make_dt_interval#4541
Draft
andygrove wants to merge 1 commit into
Draft
feat: support interval types and make_ym_interval / make_dt_interval#4541andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
…[skip ci] Implements the type-support prerequisite from issue apache#4540: add Spark YearMonthIntervalType and DayTimeIntervalType as physical types that round-trip through Comet's Arrow FFI, and route the make_ym_interval / make_dt_interval constructors through the JVM codegen dispatcher so they execute natively and match Spark exactly. Type plumbing: - proto: add YEAR_MONTH_INTERVAL (18) and DAY_TIME_INTERVAL (19) to DataTypeId. - native serde.rs: map them to Arrow Interval(YearMonth) and Duration(Microsecond) respectively. DayTime stores microseconds in an int64, which matches Duration(Microsecond) rather than the lossy Interval(DayTime) {days, millis}. - Utils.toArrowType / fromArrowType: same mapping on the JVM side. - QueryPlanSerde.serializeDataType: emit the new type ids. - CometBatchKernelCodegen: accept the two interval types in isSupportedDataType and resolve IntervalYearVector / DurationVector; emit primitive set() writes. Expressions: - make_ym_interval -> CometMakeYMInterval, make_dt_interval -> CometMakeDTInterval, both via CometCodegenDispatch, registered in temporalExpressions. CalendarIntervalType and interval arithmetic remain follow-ups under apache#4540. Tests: SQL file tests for both constructors assert answer parity and native execution (checkSparkAnswerAndOperator), covering column and literal inputs, defaults, negatives, and nulls.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #4540.
Rationale for this change
Comet had no support for Spark's interval data types, so every expression producing or consuming an interval forced a fallback to Spark, and the JVM codegen dispatcher (
CometCodegenDispatch) could not cover them either becauseCometBatchKernelCodegen.isSupportedDataTyperejected interval output. This PR adds the foundational type support and uses the existing codegen-dispatch mechanism to make the interval constructors run natively while matching Spark exactly.What changes are included in this PR?
Type support for
YearMonthIntervalTypeandDayTimeIntervalType:native/proto/src/proto/types.proto: addYEAR_MONTH_INTERVAL(18) andDAY_TIME_INTERVAL(19) toDataTypeId.native/core/src/execution/serde.rs: map them to ArrowInterval(YearMonth)andDuration(Microsecond).Utils.toArrowType/fromArrowType: same mapping on the JVM side.QueryPlanSerde.serializeDataType: emit the new type ids.CometBatchKernelCodegen: accept both interval types inisSupportedDataType, resolveIntervalYearVector/DurationVector, and emit primitiveset()writes.Representation note: Spark stores
DayTimeIntervalTypeas microseconds in anint64, which round-trips faithfully through ArrowDuration(Microsecond). ArrowInterval(DayTime)uses a{days, millis}layout that would lose microsecond precision, so it is intentionally not used.Expressions (via
CometCodegenDispatch, registered intemporalExpressions):make_ym_interval->CometMakeYMIntervalmake_dt_interval->CometMakeDTIntervalOut of scope (follow-ups tracked in #4540):
CalendarIntervalType, and interval arithmetic /multiply_*_interval/extractof interval fields.How are these changes tested?
SQL file tests under
spark/src/test/resources/sql-tests/expressions/datetime/(make_ym_interval.sql,make_dt_interval.sql) run byCometSqlFileTestSuite. Thequerymode usescheckSparkAnswerAndOperator, asserting both answer parity with Spark and native execution (a fallback fails the test). Cases cover column and literal inputs, default arguments, negatives, and nulls. The full SQL suite shows no new regressions.