feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle by d-cs · Pull Request #4118 · triggerdotdev/trigger.dev

d-cs · 2026-07-02T17:39:05Z

What

Routes the webapp write path through the run-ops split seam: trigger/batch minting, idempotency-key resolution, and the run-lifecycle services now determine residency and dispatch writes to the correct store.

Trigger & batch (runEngine/services/triggerTask.server.ts, batchTrigger.server.ts, createBatch.server.ts, streamBatchItems.server.ts, v3/services/batchTriggerV3.server.ts): mint ids with the run-ops-aware minting and route creation/streaming through the store; batch children inherit the parent's residency.
Idempotency (runEngine/concerns/idempotencyKeys.server.ts + new idempotencyResidency.server.ts): idempotency-key lookup/dedup is residency-aware so a keyed retrigger resolves against the store that owns the original run.
Run lifecycle services (createCheckpoint, createTaskRunAttempt, enqueueDelayedRun, expireEnqueuedRun, finalizeTaskRun, resumeBatchRun, cancelDevSessionRuns, executeTasksWaitingForDeploy, triggerFailedTask): resolve their target run through the store rather than a fixed client.
Reads that fan out from writes (runsRepository + clickhouseRunsRepository, BulkActionV2 + batch read-through, realtime sessions/runReader, alerts deliverAlert/performTaskRunAlerts): route through the read-through resolver.
9535ae63d — resolves the parent run through an injectable run store in TriggerFailedTaskService.
bf8f7c881 — drops the "known-migrated" concept from write-path and read repos; residency is id-shape only.
515b897ea — self-defaults resolveWaitpointThroughReadThrough to the safe run-ops clients.

Why

PR6 of the run-ops split stack. This is the write-path counterpart to the read foundation in the previous PRs: with it in place, both reads and writes route through the seam. Additive when the split is disabled (id-shape resolution collapses to the control-plane client); behavior-changing on the minting, idempotency, and lifecycle paths when enabled.

Tests

Large new/expanded vitest suite under apps/webapp/test/ and colocated service tests: trigger-task and batch-trigger store routing, residency inheritance, idempotency dedup residency + legacy-authority, bulk-action read routing, cancel-dev-session routing, alerts store routing, runs-repository read-through, realtime session/run-reader read-through and stream-registration routing, and the waitpoint read-through default. Testcontainers-backed; no mocks.

Notes

Draft, stacked on #4117 (runops/pr05-webapp-foundation). Review that first; this diff is against it.

Server-change / changeset note to be added at stack-assembly time.

🤖 Generated with Claude Code

changeset-bot · 2026-07-02T17:39:09Z

⚠️ No Changeset found

Latest commit: 4bda37a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-07-02T17:39:15Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ffc3cf87-a202-40d1-9339-d287c1dd2cbd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch runops/pr06-write-path

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

devin-ai-integration

Devin Review found 1 potential issue.

pkg-pr-new · 2026-07-02T18:06:10Z

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@4bda37a

trigger.dev

npm i https://pkg.pr.new/trigger.dev@4bda37a

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@4bda37a

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@4bda37a

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@4bda37a

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@4bda37a

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@4bda37a

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@4bda37a

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@4bda37a

commit: 4bda37a

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts (1)
245-294: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

blockRunWithWaitpoint still writes via this.prisma, not the resolved dedupClient.

dedupClient (computed above at line 155-174) is derived using parentRunFriendlyId: request.body.options?.parentRunId — the exact same parentRunId used here to block the parent run's waitpoint (line 246, 279). dedupClient is precisely the client that owns this parent run's residency, yet the transaction at line 290 still passes tx: this.prisma, the (possibly wrong) fallback client.

If split mode is enabled and the parent run resides on the "new" store while this.prisma targets the legacy store (or vice versa), this write would target the wrong database, failing to find the parent run row or silently writing state to a store that doesn't own it — contradicting the PR's core objective of routing writes to the store that owns the target run.
🐛 Proposed fix
             await this.engine.blockRunWithWaitpoint({
               runId: RunId.fromFriendlyId(parentRunId),
               waitpoints: associatedWaitpoint!.id,
               spanIdToComplete: spanId,
               batch: request.options?.batchId
                 ? {
                     id: request.options.batchId,
                     index: request.options.batchIndex ?? 0,
                   }
                 : undefined,
               projectId: request.environment.projectId,
               organizationId: request.environment.organizationId,
-              tx: this.prisma,
+              tx: dedupClient,
             });

🧹 Nitpick comments (8)

apps/webapp/app/runEngine/concerns/resolveWaitpointThroughReadThrough.server.ts (1)

44-49: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Consider forwarding logger/onLegacyReplicaRead for parity with other read-through consumers.

ReadThroughDeps supports logger and onLegacyReplicaRead (saturation-signal hook), but ResolveWaitpointDeps/this wrapper drop both, so legacy-replica reads for waitpoints won't emit the saturation signal that other read-through call sites presumably rely on for monitoring split-read health.
apps/webapp/app/runEngine/services/batchTrigger.server.ts (2)
92-99: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Discards the minted id and re-derives it via BatchId.fromFriendlyId.

mintBatchFriendlyId returns { id, friendlyId }, but only friendlyId is kept here; id is recomputed later via BatchId.fromFriendlyId(batchId) (lines 175, 266). createBatch.server.ts uses the returned id directly instead of re-deriving it. Functionally likely equivalent if BatchId.fromFriendlyId is a lossless decode, but it's redundant work and an inconsistency between two services from the same PR doing the same job.
♻️ Proposed consistency fix
-          const { friendlyId } = await mintBatchFriendlyId({
+          const { id, friendlyId } = await mintBatchFriendlyId({
             environment: {
               organizationId: environment.organizationId,
               id: environment.id,
               orgFeatureFlags: environment.organization.featureFlags,
             },
             parentRunFriendlyId: body.parentRunId,
           });
Then thread id through to #createAndProcessBatchTaskRun and use it directly instead of BatchId.fromFriendlyId(batchId).
Please confirm BatchId.fromFriendlyId reliably reconstructs the same id for both ksuid- and cuid-shaped friendly ids before treating this purely as a style nit.

Also applies to: 169-184, 265-275

359-374: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Missing batch is silently ignored — no log emitted.

When findBatchTaskRunById returns nothing, the function returns silently, unlike the environment miss two lines below which logs an error. Given store-routing bugs could make a batch invisible from the wrong store, this failure mode deserves the same observability.
🔍 Proposed fix
     const batch = await this._engine.runStore.findBatchTaskRunById(options.batchId);

     if (!batch) {
+      logger.error("[RunEngineBatchTrigger][processBatchTaskRun] Batch not found", {
+        options,
+      });
       return;
     }
apps/webapp/test/engine/streamBatchItems.test.ts (1)

655-662: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Repeated PostgresRunStore wiring across 7 test cases.

The same 4-line block assigning engine.runStore = new PostgresRunStore({ prisma: racingPrisma, readOnlyPrisma: racingPrisma }) appears 7 times. A small helper (e.g. attachRacingRunStore(engine, racingPrisma)) would reduce duplication and centralize any future changes to how the racing store is wired.

Also applies to: 787-794, 919-926, 1052-1059, 1272-1279, 1411-1418, 1600-1604
apps/webapp/app/v3/services/createCheckpoint.server.ts (1)
149-154: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Batch lookup correctly routed through RunStore.

Both WAIT_FOR_BATCH lookups now call runStore.findBatchTaskRunByFriendlyId(friendlyId, environmentId), matching the upstream contract that probes NEW then LEGACY sub-stores scoped by (friendlyId, environmentId). This correctly fixes the pre-existing gap where a raw single-DB Prisma query would miss a NEW-resident (ksuid) batch.

The identical 5-line lookup + comment block is duplicated at both call sites (149-154 and 364-369). Extracting a small private helper would remove the duplication and centralize any future changes to the routing logic.
♻️ Proposed extraction
+  // Routed by friendlyId so a ksuid (NEW-resident) batch is found on the owning DB;
+  // env-scoped to the dependent attempt's run (a batch shares its dependent's env).
+  private async findWaitForBatchRun(batchFriendlyId: string, environmentId: string) {
+    return this.runStore.findBatchTaskRunByFriendlyId(batchFriendlyId, environmentId);
+  }
Then replace both call sites with:
-        // Routed by friendlyId so a ksuid (NEW-resident) batch is found on the owning DB;
-        // env-scoped to the dependent attempt's run (a batch shares its dependent's env).
-        const batchRun = await this.runStore.findBatchTaskRunByFriendlyId(
-          reason.batchFriendlyId,
-          attempt.taskRun.runtimeEnvironmentId
-        );
+        const batchRun = await this.findWaitForBatchRun(
+          reason.batchFriendlyId,
+          attempt.taskRun.runtimeEnvironmentId
+        );
Also applies to: 364-369
apps/webapp/app/v3/services/executeTasksWaitingForDeploy.ts (1)

74-111: 🩺 Stability & Availability | 🔵 Trivial

Solid defense-in-depth split; consider a quarantine path for stuck NEW-resident runs.

The NEW/legacy split correctly prevents a control-plane updateMany/enqueue from touching runs it can't actually own. One residual risk: if a NEW-resident run keeps getting selected by findRuns (e.g. from a real misconfiguration), it will never transition out of WAITING_FOR_DEPLOY, so this job will re-log the same error and potentially keep rescheduling itself (via the runsWaitingForDeploy.length > maxCount reschedule) on every poll, indefinitely.

Consider adding a metric/alert-worthy signal or a way to skip re-selecting known-stuck NEW-resident runs.
apps/webapp/test/engine/triggerTask.test.ts (1)
2393-2402: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Redundant dynamic import() — generateKsuidId is already statically imported.

generateKsuidId is imported statically at the top of the file (line 33) and used directly a few lines later (line 2432: generateKsuidId()). The dynamic await import("@trigger.dev/core/v3/isomorphic") here is unnecessary and inconsistent with the static-import usage elsewhere in the same file.
♻️ Proposed fix
       const parentFriendlyId = RunId.toFriendlyId(
-        // 27-char ksuid → classifies NEW
-        (await import("`@trigger.dev/core/v3/isomorphic`")).generateKsuidId()
+        // 27-char ksuid → classifies NEW
+        generateKsuidId()
       );
As per coding guidelines: "Prefer static imports over dynamic import(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading."

Source: Coding guidelines
apps/webapp/test/idempotencyDedupResidency.test.ts (1)

45-103: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate seeding helpers across split-seam test files.

seedOrgProjectEnv/seedRun here are re-implemented nearly verbatim in apps/webapp/test/idempotencyKeyConcernLegacyAuthority.test.ts and apps/webapp/test/resetIdempotencyKeyLegacyAuthority.test.ts. Consider extracting a shared heteroPostgresTest fixture-seeding helper module as this residency test suite keeps growing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d490b21c-676d-49ba-83a2-67771105b181

📥 Commits

Reviewing files that changed from the base of the PR and between 413a945 and 515b897.

📒 Files selected for processing (50)

apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/concerns/idempotencyResidency.server.test.ts
apps/webapp/app/runEngine/concerns/idempotencyResidency.server.ts
apps/webapp/app/runEngine/concerns/resolveWaitpointThroughReadThrough.server.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/runEngine/services/triggerFailedTask.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.test.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/services/archiveBranch.server.ts
apps/webapp/app/services/dashboardAgent.server.ts
apps/webapp/app/services/deleteProject.server.ts
apps/webapp/app/services/realtime/runReader.server.ts
apps/webapp/app/services/realtime/sessions.server.ts
apps/webapp/app/services/runsRepository/clickhouseRunsRepository.server.ts
apps/webapp/app/services/runsRepository/runsRepository.server.ts
apps/webapp/app/v3/services/alerts/deliverAlert.server.ts
apps/webapp/app/v3/services/alerts/performTaskRunAlerts.server.ts
apps/webapp/app/v3/services/batchTriggerV3.server.ts
apps/webapp/app/v3/services/bulk/BulkActionV2.batchReadThrough.server.test.ts
apps/webapp/app/v3/services/bulk/BulkActionV2.batchReadThrough.server.ts
apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/app/v3/services/cancelDevSessionRuns.server.ts
apps/webapp/app/v3/services/createCheckpoint.server.ts
apps/webapp/app/v3/services/createTaskRunAttempt.server.ts
apps/webapp/app/v3/services/enqueueDelayedRun.server.ts
apps/webapp/app/v3/services/executeTasksWaitingForDeploy.ts
apps/webapp/app/v3/services/expireEnqueuedRun.server.ts
apps/webapp/app/v3/services/finalizeTaskRun.server.ts
apps/webapp/app/v3/services/resumeBatchRun.server.ts
apps/webapp/test/batchTriggerV3ResidencyInheritance.test.ts
apps/webapp/test/batchTriggerV3StoreRouting.test.ts
apps/webapp/test/bulkActionV2ReadRouting.test.ts
apps/webapp/test/cancelDevSessionRunsStoreRouting.test.ts
apps/webapp/test/engine/streamBatchItems.test.ts
apps/webapp/test/engine/triggerFailedTask.test.ts
apps/webapp/test/engine/triggerTask.test.ts
apps/webapp/test/idempotencyDedupResidency.test.ts
apps/webapp/test/idempotencyKeyConcernLegacyAuthority.test.ts
apps/webapp/test/performTaskRunAlertsStoreRouting.test.ts
apps/webapp/test/realtime/runReaderReadThrough.test.ts
apps/webapp/test/realtime/streamRegistrationRouting.test.ts
apps/webapp/test/resetIdempotencyKeyLegacyAuthority.test.ts
apps/webapp/test/resolveWaitpointThroughReadThrough.readthrough.test.ts
apps/webapp/test/runEngineBatchTriggerStoreRouting.test.ts
apps/webapp/test/runsRepository.readthrough.test.ts
apps/webapp/test/runsRepositoryCpres.test.ts
apps/webapp/test/sessions.readthrough.test.ts
apps/webapp/test/streamLoader.controlPlane.test.ts

d-cs · 2026-07-02T19:10:40Z

Addressed the outside-diff note on idempotencyKeys.server.ts (blockRunWithWaitpoint writing via this.prisma): the block now passes the residency-resolved dedupClient as tx, so the idempotent parent run's waitpoint write lands on the store that owns that parent run rather than the fallback client.

…eration labels Add a pure unit test for ControlPlaneCache covering per-slot round-trips, null-vs-miss distinction, epoch-based invalidation, per-slot key isolation, bounded eviction, and TTL expiry. Add a testcontainer test for probeDistinctDatabases covering distinct clusters, same physical database (with reason), same-cluster-different-database, and fail-closed probe failure. Strip developer-enumeration labels from three existing test files (readThrough step numbers, runEngineHandlers Test-X comments) and rename the run-detail loader read-through test to drop the non-domain "shape 1" name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… deps apps/webapp/package.json declares @internal/run-ops-database (workspace) and @testcontainers/postgresql but the lockfile importer entry was never regenerated, so pnpm install --frozen-lockfile fails for the webapp. Regenerate the importer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Enabling RUN_OPS_SPLIT_ENABLED without REALTIME_BACKEND_NATIVE_ENABLED silently breaks realtime: Electric replicates only from the control-plane DB, so NEW-resident (ksuid) runs on the dedicated run-ops DB are invisible and every realtime subscription hangs. Add a boot-time interlock that refuses split mode in that misconfiguration, mirroring the existing distinct-DB data-loss sentinel. The check is a pure predicate (assertSplitRealtimeInterlock) run synchronously inside assertRunOpsSplitSentinel on the same eager-boot path, failing fast before the async DB probe and before any run-ops routing is wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n diagnostics - gate runOpsTopology splitEnabled on RUN_OPS_SPLIT_ENABLED so provisioning both DSNs before flipping the flag cannot open a second pool or route writes ahead of the distinct-DB sentinel - rethrow the original UnclassifiableRunId in the cross-seam guard so its value/valueLength keep reflecting the real waitpoint id - log run-found-but-environment-unresolved distinctly from missing-run - correct the RUN_OPS_DATABASE_URL doc comment (Prisma datasource, not the webapp runtime pool) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ry on CI

…by worker-mock fix)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…uth-env through the cache-first resolver The ControlPlaneCache served env/org data with no invalidation, so admin/control-plane writes were only reflected after the TTL. Add two invalidation scopes to the cache (invalidateEnvironment for one env's slots; invalidateOrganization via a per-org epoch that env/authEnv values are stamped with, so all of an org's cached rows drop with no reverse index), expose them on the resolver, and call them at every write site that mutates cache-served data: pause/resume, archive, env/org concurrency + burst-factor, API-key regeneration, feature flags, API/batch rate limits, runs enable/disable, org + project delete, and stream-basin provisioning. Also extend the resolver's authenticated-env slot to carry `git` and make the run-engine adapter's resolveAuthenticatedEnv delegate to the cache-first, split-aware resolver instead of issuing its own $replica.findFirst, so it honors splitEnabled() and the cache like its siblings while still returning `git` and the deleted-project guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… OFF With the split OFF there is a single DB, so a run and its environment are co-located and there is no cross-seam FK/check to replace (matches main). Skip the always-on hot-path read in that branch; the split-ON branch is unchanged (cache-first, throws on a genuinely missing env). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… routing, run lifecycle Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e in TriggerFailedTaskService TriggerFailedTaskService read the parent run via the ambient module-singleton store while the engine wrote the run through its own store, so a ksuid parent's row was not found and parentTaskRunId came back null. Add an optional injected runStore (defaults to the shared singleton, preserving production behaviour) and resolve the parent through it at both call sites, mirroring triggerTask.server.ts. Align the three affected webapp tests to read through the same store the engine wrote to: triggerFailedTask.test.ts passes engine.runStore; performTaskRunAlerts routing passes a passthrough store over the seeded container; triggerTask.test.ts stubs the run-ops db handles and pins split mode off so the idempotency dedup uses the container client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…id-shape only Migration is deferred, so child/batch residency is a pure id-shape check. Remove the isKnownMigrated (and mint-only isSplitEnabled) deps from the mint sites (triggerTask, triggerFailedTask, batchTriggerV3) and call the now- synchronous resolveInheritedMintKind(parentFriendlyId) with no deps arg. Read paths: drop the isKnownMigrated re-probe-avoidance from the ClickHouse runs hydrate (probe all missing on legacy), the runsRepository readThrough options type, resolveWaitpointThroughReadThrough deps, and the BulkActionV2 batch seam adapter — keeping the genuine cross-seam fallback that reads NEW first for unclassifiable/legacy-candidate ids. Delete the injected-marker test cases; the remaining residency tests assert pure id-shape inheritance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s and test names Review hygiene only: remove the NEW-1 label, Test X: name prefixes, and [TEST-NEWSEED] comment label. No product logic or test behavior changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…o safe run-ops clients The read-through concern defaulted both newClient and legacyReplica to $replica (control-plane), so a bare caller that omits `deps` — the waitpoints wait route — never queried the dedicated run-ops replica. A co-located, NEW-resident waitpoint minted by streams.input().wait() lives on the run-ops-new DB, so the read missed, returned null, and the route 404'd (re-serialized to 500). Match the deps the complete/callback routes pass: default newClient to runOpsNewReplica, legacyReplica to $replica, and splitEnabled to runOpsSplitReadEnabled — mirroring readThroughRun's own self-defaulting. This immunizes any bare caller (present or future) against the control-plane pin, without touching the wait route. The wait/complete/callback call sites live on a higher branch and are unchanged; complete/callback keep their explicit deps (now redundant but harmless). Adds a heteroRunOps regression case driving the concern with no `deps` via the `defaults` DI seam: proves the old $replica default misses a NEW-resident waitpoint (null) while the safe run-ops default finds it. No mocks; the fallback is exercised against real PG14/PG17 containers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rvice prisma to the resolved store - Block the idempotent parent run's waitpoint via the residency-resolved dedup client instead of the fallback prisma, so the write lands on the store that owns the parent run. - Pass the caller-provided _prisma into WithRunEngine so a custom store isn't silently overridden by the module singleton. - Throw when a run-backed alert's environment can't be resolved instead of marking it SENT, so a transient replica miss doesn't permanently suppress the alert. - Pin splitEnabled:false in the waitpoint passthrough test so it exercises single-DB behaviour rather than relying on ksuid residency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The write-path split added static `runOpsLegacyPrisma`/`runOpsNewPrisma` imports to idempotencyKeys.server.ts, which this test loads. vitest validates every named import against the `~/db.server` mock, so the mock now errored on the missing run-ops singletons. Add the four run-ops exports (empty stubs, same boundary pattern as the batchTriggerV3 residency test) and pin isSplitEnabled() to false so the dedup routing deterministically returns the injected fake prisma regardless of the ambient RUN_OPS_SPLIT_ENABLED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…setup Worker/engine/marqs/pubsub/socket singletons each construct an ioredis client at import time (singleton() + no lazyConnect), so any test importing the service graph opened real Redis connections on import. In CI there is no Redis, so these accumulate infinite-retry clients across a shard and take the suite down (locally they pass only because dev Redis is up). Globally mock the eager-Redis modules to no-op stubs in test/setup.ts: commonWorker, batchTriggerWorker, legacyRunEngineWorker, alertsWorker, the RunEngine and MarQS singletons, devPubSub and the socket.io server. Only these singletons are mocked — never the run store (~/v3/runStore.server, ~/db.server), which store-routing/residency tests need real against testcontainer Postgres. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…yConnect + stub runtime Redis singletons The setup-file mocks of the six eager worker/engine singletons were not enough: CI shards still flooded ECONNREFUSED/maxRetries. Two further classes of env-Redis usage survived them, reproduced locally by running the failing shards with REDIS_PORT pointed at a dead port: 1. Import-time construction: ~15 more singletons (platform cache, billing-limit reconcile queue, alerts rate limiter, DevPresence, auto-increment counter, s2 token cache, v1 streams cache, ...) build ioredis clients at module import, and ioredis dials on construction. A global ioredis mock now forces lazyConnect: true so clients only dial on first command — testcontainer-backed tests are unaffected (their first command connects as before). 2. Runtime commands inside code under test: tracePubSub.publish() (eventRepository writes), alertsRateLimiter.check() (deliverAlert) and the task metadata cache each issue commands against env-configured Redis mid-test; every command burns ~20 reconnect cycles before its error surfaces, which times the tests out. These three modules are now stubbed (metadata cache pinned to its Noop implementation, which is what CI's unset env resolves to anyway). Verified: webapp shards 2/5/6/8 (the ones failing on the pr06+ stack) run green with Redis pointed at a dead port, and shards 2/8 stay green against live Redis (store-routing suites still exercise the real run store). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…in CI CI runners have no .env, no REDIS_HOST/REDIS_PORT, and no Postgres at localhost:5432, which surfaced two failure layers that local runs mask (the dev stack answers on both): - suites transitively importing triggerTaskV1.server failed to collect because autoIncrementCounter.server.ts throws at import when REDIS_HOST/REDIS_PORT are unset (shards 2/5/6). Default the pair in test/setup.ts — the global ioredis lazyConnect mock means nothing dials. - TriggerFailedTaskService.call() resolved its event repository via getEventRepository → global prisma (feature-flag read + Prisma event repo), so in CI the swallowed connect error returned null friendlyIds (shard 8). Allow injecting the repository/store pair and bind the test to an EventRepository over the testcontainer DB. - once the cancelDevSessionRuns suite could collect, findLatestSession's hardwired global $replica was the next masked layer; give it an injectable client (defaulting to $replica) and pass the service's _replica through. Verified by replaying the exact CI env locally (.env hidden, workflow env vars, dead localhost DB, GITHUB_ACTIONS set): all four failing suites and full shards 2/5/6/8 reproduce the CI failures before and pass after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…method access Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…am regressions Two regression tests for the write-path read seams: - runsRepository: paginating the full keyset over interleaved cuid/ksuid runs enumerates every id once, no empty page, in ClickHouse (created_at DESC, run_id DESC) order -- fails if hydration reverts to lexical id desc across the id-space seam. - runReader: a NEW-resident (ksuid) run's terminal metadata hydrates through the owning store, never a generic legacy replica. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

d-cs mentioned this pull request Jul 2, 2026

feat(run-ops): ClickHouse multi-source replication fan-in + admin ops #4119

Draft

d-cs self-assigned this Jul 2, 2026

devin-ai-integration Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts

d-cs force-pushed the runops/pr05-webapp-foundation branch from 413a945 to 99643f8 Compare July 2, 2026 18:02

d-cs force-pushed the runops/pr06-write-path branch from 515b897 to cb97148 Compare July 2, 2026 18:02

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

d-cs force-pushed the runops/pr05-webapp-foundation branch from 26871d5 to cdc4eb9 Compare July 2, 2026 19:25

d-cs force-pushed the runops/pr06-write-path branch from c59d9c5 to d5d7fa1 Compare July 2, 2026 19:25

d-cs force-pushed the runops/pr05-webapp-foundation branch from cdc4eb9 to e0b35d5 Compare July 2, 2026 20:21

d-cs force-pushed the runops/pr06-write-path branch 3 times, most recently from 0db90f0 to d5415e8 Compare July 2, 2026 21:44

d-cs force-pushed the runops/pr05-webapp-foundation branch from 8024e36 to f9b9b0b Compare July 3, 2026 08:51

d-cs force-pushed the runops/pr06-write-path branch from aa55b6b to 3153bc4 Compare July 3, 2026 08:51

d-cs force-pushed the runops/pr05-webapp-foundation branch from f9b9b0b to 0937b15 Compare July 3, 2026 10:02

d-cs force-pushed the runops/pr06-write-path branch from 3153bc4 to d561590 Compare July 3, 2026 10:02

d-cs force-pushed the runops/pr05-webapp-foundation branch from 0937b15 to 729daf1 Compare July 3, 2026 10:36

d-cs force-pushed the runops/pr06-write-path branch from d561590 to 9e7c367 Compare July 3, 2026 10:36

d-cs force-pushed the runops/pr05-webapp-foundation branch from 729daf1 to bd6fc79 Compare July 3, 2026 10:44

d-cs force-pushed the runops/pr06-write-path branch from 9e7c367 to e23432d Compare July 3, 2026 10:44

d-cs force-pushed the runops/pr05-webapp-foundation branch from bd6fc79 to a7e0846 Compare July 3, 2026 11:08

d-cs force-pushed the runops/pr06-write-path branch from e23432d to 8dff8b2 Compare July 3, 2026 11:08

d-cs force-pushed the runops/pr05-webapp-foundation branch from a7e0846 to 4119616 Compare July 3, 2026 12:08

d-cs force-pushed the runops/pr06-write-path branch 2 times, most recently from 891d81a to 5140cbc Compare July 3, 2026 15:42

d-cs force-pushed the runops/pr05-webapp-foundation branch from d087c25 to b554794 Compare July 3, 2026 16:33

d-cs and others added 27 commits July 3, 2026 17:43

style(run-ops): apply oxfmt

72ace5c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(webapp): cap vitest fork concurrency to bound testcontainer memo…

cac7c11

…ry on CI

chore(webapp): drop vitest maxForks cap (broke typecheck; superseded …

70e54d5

…by worker-mock fix)

chore: add server-changes for pr05

009c4cf

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(server-changes): consolidate pr05 run-ops split entries into one

bd8cd5e

chore(run-ops): fix lint/format for main lint rules

32830f5

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(run-ops): webapp write path — trigger/batch minting, idempotency…

18333e8

… routing, run lifecycle Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

style(run-ops): apply oxfmt

38dccdc

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore: add server-changes for pr06

cf28e78

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(run-ops): fix lint/format for main lint rules

1f075bb

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(run-ops test): make engine/marqs no-op mock recursive for nested …

25f8e5f

…method access Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

d-cs force-pushed the runops/pr05-webapp-foundation branch from b554794 to 071cdc1 Compare July 3, 2026 16:44

d-cs force-pushed the runops/pr06-write-path branch from f8f3096 to 4bda37a Compare July 3, 2026 16:44

Base automatically changed from runops/pr05-webapp-foundation to main July 3, 2026 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle#4118

feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle#4118
d-cs wants to merge 29 commits into
mainfrom
runops/pr06-write-path

d-cs commented Jul 2, 2026

Uh oh!

changeset-bot Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Review skipped

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d-cs commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

d-cs commented Jul 2, 2026

What

Why

Tests

Notes

Uh oh!

changeset-bot Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d-cs commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

pkg-pr-new Bot commented Jul 2, 2026 •

edited

Loading