Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
459f38b
fix: propagate parent struct null mask in GetStructField
schenksj May 30, 2026
37ee929
fix(shuffle): get_string tolerates non-UTF-8 bytes (lossy decode)
schenksj May 30, 2026
7786c5c
fix: decline native V1 scans on object_store-unsupported filesystem s…
schenksj May 30, 2026
094a6cb
Merge branch 'main' into fix/get-struct-field-null-mask
schenksj May 30, 2026
367f460
Merge branch 'main' into fix/get-string-lossy-utf8
schenksj May 30, 2026
8c1c260
Merge branch 'main' into fix/scheme-gate-object-store
schenksj May 30, 2026
50e7dad
fix: rebalance deep AND/OR chains to avoid protobuf recursion limit
schenksj May 30, 2026
e73729c
fix: materialize ConstantColumnVector on Comet's serialize/export paths
schenksj May 30, 2026
163aaa3
fix: decline CreateArray with struct-nullability-divergent children
schenksj May 30, 2026
fa97ca5
perf: O(1) PlanDataInjector lookup by op kind
schenksj May 30, 2026
bad57e5
feat: surface native parquet read failures as FAILED_READ_FILE
schenksj May 30, 2026
0ad7f7f
ci: register PlanDataInjectorSuite in PR build workflows
schenksj May 30, 2026
b457024
ci: register CometScanSchemeFallbackSuite in PR build workflows
schenksj May 30, 2026
8b1c417
test: make CometScanSchemeFallbackSuite compile under Spark 3.5 / Sca…
schenksj May 30, 2026
7e8b136
refactor: use imports instead of fully-qualified names in CometScanRule
schenksj May 30, 2026
52aea50
refactor: flatten AND/OR chains iteratively; test null operands and O…
schenksj May 30, 2026
5c83ea5
test: add end-to-end shuffle test for non-UTF-8 StringType bytes (#4521)
schenksj May 30, 2026
a1a29a0
fix: use withFallbackReason in scheme gate (leftover from #4508 rename)
schenksj May 30, 2026
6fdd1b5
fix: declare GetStructField output nullable when the parent struct is…
schenksj May 31, 2026
9a5c64e
fix: classify a missing file as FileNotFound (readCurrentFileNotFound…
schenksj May 31, 2026
571fce1
fix: address review feedback — spark-3.4 shim, version-stable asserti…
schenksj May 31, 2026
9d2154c
fix: address review — corrupt-footer wording + tidy the file-read thr…
schenksj May 31, 2026
66571c5
fix(shuffle): match JVM U+FFFD granularity in get_string decode
schenksj May 31, 2026
f5e6609
fix: default hdfs scheme to natively-readable in CometScanRule gate
schenksj May 31, 2026
7bb7be4
Merge remote-tracking branch 'origin/fix/get-struct-field-null-mask' …
schenksj May 31, 2026
ba9b586
Merge remote-tracking branch 'origin/fix/get-string-lossy-utf8' into …
schenksj May 31, 2026
e74008a
Merge remote-tracking branch 'origin/fix/scheme-gate-object-store' in…
schenksj May 31, 2026
76eedb3
Merge remote-tracking branch 'origin/fix/and-or-rebalance-recursion-l…
schenksj May 31, 2026
52375c5
Merge remote-tracking branch 'origin/fix/materialize-constant-column-…
schenksj May 31, 2026
7e41866
Merge remote-tracking branch 'origin/fix/create-array-mismatched-chil…
schenksj May 31, 2026
85dc340
Merge remote-tracking branch 'origin/fix/plandatainjector-o1-lookup' …
schenksj May 31, 2026
055c837
Merge remote-tracking branch 'origin/fix/failed-read-file-wrapping' i…
schenksj May 31, 2026
4ebb9db
contrib(delta): native Delta scan crate + core glue
schenksj May 30, 2026
4767f1d
contrib(delta): Spark integration (reflection bridge, scan rule, execs)
schenksj May 30, 2026
d2a4a24
contrib(delta): build wiring -- Maven contrib-delta profile + Cargo f…
schenksj May 30, 2026
0a59ebb
test(contrib-delta): Delta integration test suites
schenksj May 30, 2026
6d9325b
contrib(delta): docs, regression harness, and CI
schenksj May 30, 2026
fc2aca2
fix(contrib-delta): address code-review feedback (native panics, sche…
schenksj May 31, 2026
d6c0568
fix(contrib-delta): empty-partition fast path must emit synthetic-col…
schenksj Jun 1, 2026
1160a16
test(contrib-delta): de-flake merge-metrics numTargetFilesAdded for n…
schenksj Jun 1, 2026
87e657c
fix: enable parquet type promotion for Delta type-widening reads
schenksj Jun 3, 2026
42c341f
test: de-flake merge-metrics numTargetFilesAdded in Spark 4.0/4.1 Del…
schenksj Jun 3, 2026
d592f0e
refactor(contrib-delta): merge DV-filter into one DV-sweep exec
schenksj Jun 3, 2026
8f8b7d4
docs(contrib-delta): clarify DV-sweep guard, type-promotion scope, me…
schenksj Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions .github/workflows/delta_contrib_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Runs the contrib-delta Scala test suite on both supported (Spark, Delta)
# version pairs PLUS the build-gate verification script that proves default
# builds carry zero Delta surface. Modeled on iceberg_spark_test.yml.
#
# Three jobs:
# 1. build-native -- builds libcomet.so once with --features
# contrib-delta, uploads as an artifact.
# 2. delta-contrib-scala -- matrix over (Spark 3.5 + Delta 3.3.2),
# (Spark 4.0 + Delta 4.0.0) and
# (Spark 4.1 + Delta 4.1.0), downloads the
# native lib, runs all 25 contrib Scala
# suites per matrix cell.
# 3. delta-build-gate -- cheap independent job; runs
# dev/verify-contrib-delta-gate.sh which
# proves default cargo / mvn / dylib carry
# zero Delta surface. Runs in parallel.

name: Delta Lake Contrib Tests

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

on:
push:
branches:
- main
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "dev/changelog/*.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark-integration/**"
pull_request:
paths-ignore:
- "benchmarks/**"
- "doc/**"
- "docs/**"
- "**.md"
- "dev/changelog/*.md"
- "native/core/benches/**"
- "native/spark-expr/benches/**"
- "spark/src/main/scala/org/apache/comet/GenerateDocs.scala"
- "spark-integration/**"
workflow_dispatch:

permissions:
contents: read

env:
RUST_VERSION: stable
RUST_BACKTRACE: 1
# Force GNU ld on Linux to match the rest of Comet's CI (rust-lld can't
# resolve -ljvm against the Zulu JDK layout installed by setup-java).
RUSTFLAGS: "-Clink-arg=-fuse-ld=bfd"

jobs:
# Build libcomet ONCE with the contrib-delta feature and share with both
# matrix cells via an artifact upload/download. Mirrors iceberg_spark_test.yml.
build-native:
name: Build Native Library (contrib-delta)
runs-on: ubuntu-24.04
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Restore Cargo cache
uses: actions/cache/restore@v5
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-contrib-delta-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml', 'contrib/delta/native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs', 'contrib/delta/native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-contrib-delta-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml', 'contrib/delta/native/**/Cargo.toml') }}-

- name: Build native library with contrib-delta
run: |
cd native && cargo build --profile ci --features contrib-delta
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3 -Clink-arg=-fuse-ld=bfd"

- name: Run cargo tests for contrib-delta crate
run: |
# The crate's own unit + integration tests (e.g. proto<->kernel descriptor
# round-trip, DV-decode error mapping, schema-adapter behaviour) live in the
# contrib-delta crate itself. Without this step, broken tests can slip past
# CI because the build step only validates compilation, not behaviour.
cd contrib/delta/native && cargo test --release
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3 -Clink-arg=-fuse-ld=bfd"

- name: Save Cargo cache
uses: actions/cache/save@v5
if: github.ref == 'refs/heads/main'
with:
path: |
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-contrib-delta-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml', 'contrib/delta/native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs', 'contrib/delta/native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v7
with:
name: native-lib-contrib-delta
path: native/target/ci/libcomet.so
retention-days: 1

# Run all 25 contrib Scala suites across every (Spark, Delta) version
# pair. The matrix asserts feature parity: the same suites must pass on
# Spark 3.5 + Delta 3.3.2, Spark 4.0 + Delta 4.0.0 AND Spark 4.1 + Delta 4.1.0.
delta-contrib-scala:
needs: build-native
strategy:
matrix:
include:
- spark-version: { short: '3.5', full: '3.5.8' }
delta-version: '3.3.2'
scala-version: '2.13'
java-version: 17
- spark-version: { short: '4.0', full: '4.0.1' }
delta-version: '4.0.0'
scala-version: '2.13'
java-version: 17
- spark-version: { short: '4.1', full: '4.1.1' }
delta-version: '4.1.0'
scala-version: '2.13'
java-version: 17
fail-fast: false
name: delta-contrib/spark-${{ matrix.spark-version.full }}/delta-${{ matrix.delta-version }}
runs-on: ubuntu-24.04
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: ${{ matrix.java-version }}

- name: Download native library
uses: actions/download-artifact@v8
with:
name: native-lib-contrib-delta
# Comet's test JVM looks under native/target/debug/ first then
# /release/. The CI build profile lands under /ci/ so we place it
# in /debug/ to satisfy the loader.
path: native/target/debug/

- name: Run contrib-delta Scala test suites
# Run every contrib-delta suite (functional + regression/repro/audit), not a
# curated subset -- the repro suites are behaviour guards and must run in CI to
# catch regressions. wildcardSuites matches by package prefix; the bulk live in
# org.apache.comet.contrib.delta, plus one in org.apache.spark.sql.delta.
run: |
./mvnw -Pspark-${{ matrix.spark-version.short }},contrib-delta \
-pl spark -am test \
-DwildcardSuites='org.apache.comet.contrib.delta,org.apache.spark.sql.delta.CometDeltaCheckpointFilterReproSuite' \
-Djava.version=${{ matrix.java-version }} \
-Dmaven.compiler.source=${{ matrix.java-version }} \
-Dmaven.compiler.target=${{ matrix.java-version }} \
-Dmaven.gitcommitid.skip

- name: Upload surefire reports on failure
if: failure()
uses: actions/upload-artifact@v7
with:
name: surefire-reports-spark-${{ matrix.spark-version.short }}-delta-${{ matrix.delta-version }}
path: spark/target/surefire-reports/
retention-days: 5

# Independent of build-native: a cheap proof that default builds carry
# zero Delta surface. Runs the dev/verify-contrib-delta-gate.sh script
# which does its own cargo + mvn invocations across both feature
# combinations.
delta-build-gate:
name: Build-gate verification (default builds exclude Delta)
runs-on: ubuntu-24.04
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v6

- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{ env.RUST_VERSION }}
jdk-version: 17

- name: Run dev/verify-contrib-delta-gate.sh
run: |
dev/verify-contrib-delta-gate.sh
Loading