bundle/fuzz: add create-payload parity fuzz test for terraform vs direct#5686
Open
radakam wants to merge 14 commits into
Open
bundle/fuzz: add create-payload parity fuzz test for terraform vs direct#5686radakam wants to merge 14 commits into
radakam wants to merge 14 commits into
Conversation
Implements the first technique from DECO-25361: generate random job
configs and check for differences in the create payload between the
terraform and direct deploy engines.
Both engines run the same `bundle deploy` pipeline in-process (via
testcli) against a testserver, differing only in DATABRICKS_BUNDLE_ENGINE,
and the POST /api/2.2/jobs/create body each sends is captured and diffed.
Because only the engine differs, shared mutators cancel out and any
remaining diff is a genuine engine divergence.
The fuzzer already surfaced two real (benign) divergences, documented in
DefaultIgnorePaths:
- num_workers: 0 is sent explicitly by terraform but dropped by direct
(omitempty).
- the terraform provider strips the deprecated spark conf
"spark.databricks.delta.preview.enabled"; direct forwards it.
Run with: go test ./bundle/fuzz -run TestJobCreateParity
(FUZZ_SEEDS overrides the seed count; auto-skips when terraform is not
provisioned via acceptance/install_terraform.py).
…ignore Address golangci-lint failures (intrange loops, strconv.FormatBool over fmt.Sprintf) and tighten the create-payload ignore list: drop the dead job_clusters num_workers entry (those are at parity) and document the task-level num_workers divergence as a real CLI gap to fix separately.
Collaborator
Integration test reportCommit: bdddd24
22 interesting tests: 13 SKIP, 7 KNOWN, 1 flaky, 1 RECOVERED
Top 4 slowest tests (at least 2 minutes):
|
- Add a `test-fuzz` task and a nightly CI job that provisions terraform and runs the create-payload parity tests. They previously always skipped because terraform was never provisioned in the test path. - Ignore repo-root build/ so the provisioned terraform binary and provider mirror are not accidentally committed. - Skip cleanly when build/ is only partially provisioned (missing provider mirror or .terraformrc) instead of failing mid-deploy. - Document that the harness covers jobs only for now (DECO-25361).
Make the create-payload parity fuzz suite explore new configs over time and be reproducible from a reported seed: - FUZZ_SEED (comma-separated) runs exactly those seeds, overriding the range, so a reported divergence reproduces with one command. The failure message now prints this knob. - FUZZ_SEED_OFFSET shifts the deterministic window; push.yml derives it from GITHUB_RUN_NUMBER so each nightly run checks seeds it has never tested before instead of re-checking a fixed set. Windows are non-overlapping because the run number is unique and monotonic. - Guard FUZZ_SEEDS > 0 so a negative value no longer panics make() and zero no longer passes as a no-op. - Drop the test-fuzz Task sources fingerprint: the seeds depend on env vars Task can't see, so skipping on an unchanged checksum would silently no-op a repro run or a shifted window. - Keep the nightly window modest (25); exploration comes from rotation, not size, and it can be raised once nightly timings are known.
The terraform provider force-sends num_workers: 0 for a single-node new_cluster (no autoscale) on both job_clusters and task-level clusters, but JobClustersFixups only applied initializeNumWorkers to job_clusters. The direct engine therefore omitted num_workers on task clusters, so the two engines produced divergent create payloads. This divergence was surfaced by the bundle/fuzz parity harness. Apply initializeNumWorkers to task new_cluster too so the direct engine matches terraform, and drop the now-obsolete tasks[*].new_cluster.num_workers entry from the fuzz DefaultIgnorePaths.
The nightly test-fuzz job is intentionally excluded from test-result, so a failure was only visible in the Actions tab. Add a failure step that opens (or comments on) a single deduped GitHub issue with a one-command repro. Also correct the jobsCreatePath comment: a different API version shows up as a capture failure (the testserver registers only this route, so a mismatched version 404s and the deploy fails), not as a payload diff.
…ion test Rename the capture/deploy/recorder helpers to *_test.go so the parity harness compiles only under `go test` instead of into the package's regular build, and add a committed regression test (cluster_fixups_test.go) covering the single-node task-cluster num_workers force-send fix so the divergence is guarded at PR time, not just in the nightly suite.
…ting Move the remaining generator/diff/rand implementation into _test.go files (keeping only a doc.go for the package comment) so nothing in the harness compiles into the regular build, since no product code imports it. Distinguish deploy/capture failures from create-payload divergences in checkJobParity: skip when neither engine deploys the generated config, fail distinctly when exactly one engine accepts it (an acceptance divergence, not a payload diff), and only diff payloads when both deploys succeed. This keeps nightly triage from misdirecting a deploy failure into regressionSeeds. Also document the unique-identity-key assumption in diffKeyedSlice.
Use strings.SplitSeq instead of ranging over strings.Split (modernize stringsseq) and require.Positivef instead of require.Greaterf(t, n, 0) (testifylint negative-positive).
The failure-reporting step used `gh issue list --jq '.[0].number'`, which prints the literal "null" when no open issue exists, so it always took the comment branch and tried to comment on issue "null" instead of creating one. Use `// empty` so the create branch runs on the first divergence.
Revert the num_workers single-node task-cluster fix along with its unit test and acceptance updates so this PR adds only the parity harness. Both terraform/direct divergences the harness found are now documented and suppressed via DefaultIgnorePaths rather than fixed (fixes follow separately): num_workers on single-node task clusters (seed 29) and the spark.databricks.delta.preview.enabled spark conf key.
Address review feedback on the create-payload parity harness: - Replace the path-only ignore list with value-conditional ignore rules so the documented num_workers divergence (direct omits, terraform force-sends 0) is suppressed only for that exact shape; a real value mismatch at the same path now fails again. - Unexport package-internal identifiers (generateJob, diffPayloads, difference, defaultIgnoreRules) that are only used within the package. - Document why TestCaptureJobCreateDirect is intentionally not opt-in. - Reword the one-sided-deploy failures as deploy/capture differences rather than asserting one engine "rejected" the config. - Make TestParitySeeds hermetic against ambient FUZZ_* env vars. - Correct the seed 29 comment to reflect that the divergence is suppressed.
The terraform provider force-sends num_workers:0 for a single-node new_cluster on task-level clusters too, not just shared job_clusters, but prepareJobSettingsForUpdate only applied initializeNumWorkers to job_clusters. The direct engine therefore omitted num_workers on task clusters and the two engines produced divergent create payloads (found by the bundle/fuzz parity harness, seed 29). Apply initializeNumWorkers to task new_cluster too so the direct engine matches terraform, drop the now-obsolete tasks[*].new_cluster.num_workers ignore entry, and simplify the fuzz ignore list to a plain []string now that value-conditional matching is no longer needed.
Contributor
Approval status: pending
|
denik
reviewed
Jun 26, 2026
| // tends to differ between engines. | ||
| // | ||
| // TODO(DECO-25361): generalize the harness across resource kinds. | ||
| func generateJob(rng *rand.Rand) *resources.Job { |
Contributor
There was a problem hiding this comment.
Interesting approach.
I was thinking more of using bundle schema so we can fuzz anything, not just jobs.
Switch the fuzz suite from comparing terraform and direct create payloads to asserting invariants on the direct engine's payload. Terraform and direct can disagree for legitimate reasons, so a payload diff is noisy; an invariant has no legitimate reason to fail, so a failure is a real bug. This drops the payload diff and its ignore-list of documented divergences, and removes terraform from the harness (each seed is now one in-process direct deploy). Gate on `bundle validate` so the suite distinguishes the two fuzzing outcomes: an invalid config skips (it can't violate an invariant), while a validated config that fails to deploy or breaks an invariant fails. This is the distinction a looser, schema-driven generator will rely on. Revert the num_workers:0 force-send for single-node task clusters (and its acceptance goldens): it only matched terraform's payload, with no demonstrated behavior benefit, and direct has shipped without it. If a real backend requirement is confirmed, it can return as a standalone change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A test-only
bundle/fuzzpackage that randomly generates job configs, deploys each under both the terraform and direct engines against an in-process testserver, and diffs the capturedjobs/createpayloads. Seeds are deterministic, so any divergence reproduces from the printedFUZZ_SEED=<n>.Runs nightly (
task test-fuzz), opt-in viaFUZZ_*and excluded from per-PR/merge-queue checks since each seed runs two real deploys; failures surface as a deduped GitHub issue. Jobs only for now.Why
We need confidence the direct engine produces the same API payloads as terraform during migration. This gives systematic, reproducible coverage and a nightly drift signal.
Divergences found
num_workerson single-node task clusters (seed 29): terraform force-sendsnum_workers: 0where direct omitted it. Fixed:prepareJobSettingsForUpdatenow appliesinitializeNumWorkersto tasknew_clustertoo, not just sharedjob_clusters. Seed 29 stays inregressionSeedsto guard against regression.spark.databricks.delta.preview.enabled: terraform strips this deprecatedspark_confkey; direct forwards it. The backend ignores it either way, so it stays indefaultIgnorePaths(benign, no fix needed).Testing
go test ./bundle/fuzz/...and./bundle/config/mutator/resourcemutator/...— diff engine, generators, and thenum_workersfix.task test-fuzz— full terraform-vs-direct parity run (nightly).num_workers: 0on single-node task clusters under both engines.