Non-atomic final assistant persist / activeStreamId clear can replay the response on refresh

## Summary

In `apps/web/app/workflows/chat.ts`, the current success path on `main` persists the final assistant snapshot before clearing `chats.activeStreamId`. Those are two separate DB writes with no ordering guarantee beyond control flow:

- the DB already contains the final assistant message
- `activeStreamId` still points at the finishing workflow run

That intermediate state is observable by the client on refresh and produces a user-visible duplicate-response bug. It also leaves a silent-loss path because `persistAssistantMessage` catches DB errors and continues.

Simplified current success path:

```ts
await persistAssistantMessage(options.chatId, pendingAssistantResponse);

// auto-commit / auto-PR / sandbox state updates may run here
// and extend the window significantly

if (didUpdateGitData) {
  await persistAssistantMessage(options.chatId, pendingAssistantResponse);
}

await Promise.all([
  clearActiveStream(options.chatId, workflowRunId),
  sendFinish(writable).then(() => closeStream(writable)),
]);
```

## Failure mode 1 — Duplicate final response on refresh

Between the last successful assistant persist and `clearActiveStream(...)`:

- the page loads `initialMessages` from DB, including the final assistant message
- `chat.activeStreamId` is still non-null, so the client resumes `/api/chat/{id}/stream`
- the durable workflow stream replays the run on top of the already-hydrated assistant message
- on the AI SDK version used here, at minimum reasoning parts are reconciled duplicates on replay (see #545); in practice the full final response visibly renders a second time

## Failure mode 2 — Final assistant state can be silently lost

`persistAssistantMessage(...)` currently catches DB errors and only logs them.

If the final assistant persist fails, but the workflow still proceeds to `clearActiveStream(...)` and sends `finish`, the chat can end up in this state:

- no final assistant snapshot in DB
- `activeStreamId` already cleared
- the client already saw the response live and transitioned to a finished state

After a hard refresh, the final assistant response is missing from durable chat history and there is no active stream left to resume.

## Reproduction

### Natural path (no code changes)

1. Enable `autoCommitEnabled` and/or `autoCreatePrEnabled` on a chat with repo context.
2. In `chat.ts` the final assistant persist runs **before** the auto-commit / auto-PR steps, while `clearActiveStream(...)` runs **after** them — the window is the full duration of those steps (seconds to tens of seconds depending on repo size and network).
3. Send a prompt, wait until the assistant text is visible, then hard refresh during the auto-commit / auto-PR phase.
4. Observe the same assistant response rendered a second time on top of the DB-hydrated message.

### Deterministic repro

1. Insert `await new Promise(r => setTimeout(r, 3000));` immediately before the final `Promise.all([ clearActiveStream(...), sendFinish(...), ... ])` in the workflow success path.
2. Send a prompt and wait until the assistant response is visible.
3. Hard refresh during the sleep.
4. Same duplicate rendering as above, reliably.

### Silent-loss path

Force the final `upsertChatMessageScoped(...)` write to fail (e.g. inject a thrown error in `persistAssistantMessage`). The workflow still clears the slot and sends `finish`, and the final assistant message is not durable.

## Why ordering alone doesn't fix this

Swapping the order just moves the bad window:

- `persist` first, `clear` later: duplicate-on-refresh window
- `clear` first, `persist` later: missing-message / silent-loss window if persist fails

The invariant that matters is:

- once the final assistant snapshot is durable, that run should no longer be resumable
- while the run is still resumable, the final assistant snapshot should not yet be durable

Only a single DB transaction closes that gap.

## Proposed fix

Add a helper that atomically upserts the final assistant snapshot and clears the active stream slot with CAS semantics:

```ts
export async function upsertChatMessageAndClearActiveStream(
  data: NewChatMessage,
  workflowRunId: string,
): Promise<UpsertChatMessageScopedResult> {
  return db.transaction(async (tx) => {
    // existing scoped upsert logic...

    await tx
      .update(chats)
      .set({ activeStreamId: null })
      .where(
        and(
          eq(chats.id, data.chatId),
          eq(chats.activeStreamId, workflowRunId),
        ),
      );

    return result;
  });
}
```

Then in the workflow:

- keep `activeStreamId` claimed until the last point the assistant message can still change (i.e. after any auto-commit / auto-PR data-part updates)
- replace the final `persistAssistantMessage(...)` + `clearActiveStream(...)` pair with the transactional helper
- after that transaction commits, emit `finish` and close the stream
- keep bare `clearActiveStream(...)` only as a fallback for abort / error paths where there is no final assistant snapshot to persist

This also makes the silent-loss path benign: if the transaction fails, `activeStreamId` stays set and the client can resume on refresh instead of landing in a "no message, no stream" terminal state.

## Relation to existing issues

This is narrower than #526 (rethink long-running stream replay) and orthogonal to #545 (reasoning-part duplication on resume):

- #526 is about whether internal execution should be replayed to the client at all.
- #545 is about specific chunk types (`rs_*` reasoning) being duplicated across replay.
- This issue is about the **atomicity of the terminal state transition** itself. Even if #526 lands (no more internal replay) and #545 lands (reasoning dedup), the final durable assistant snapshot and the resumability flag should still flip atomically, so a refresh never sees "snapshot durable + still resumable" or "snapshot lost + no longer resumable".

A single DB transaction is sufficient and independent of the transport-level decisions in #526 / #545.

## Notes

Observed and verified locally on `main`. Happy to turn the fix into a PR if this aligns with your preferred direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-atomic final assistant persist / activeStreamId clear can replay the response on refresh #845

Summary

Failure mode 1 — Duplicate final response on refresh

Failure mode 2 — Final assistant state can be silently lost

Reproduction

Natural path (no code changes)

Deterministic repro

Silent-loss path

Why ordering alone doesn't fix this

Proposed fix

Relation to existing issues

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-atomic final assistant persist / activeStreamId clear can replay the response on refresh #845

Description

Summary

Failure mode 1 — Duplicate final response on refresh

Failure mode 2 — Final assistant state can be silently lost

Reproduction

Natural path (no code changes)

Deterministic repro

Silent-loss path

Why ordering alone doesn't fix this

Proposed fix

Relation to existing issues

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions