You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In apps/web/app/workflows/chat.ts, the current success path on main persists the final assistant snapshot before clearing chats.activeStreamId. Those are two separate DB writes with no ordering guarantee beyond control flow:
the DB already contains the final assistant message
activeStreamId still points at the finishing workflow run
That intermediate state is observable by the client on refresh and produces a user-visible duplicate-response bug. It also leaves a silent-loss path because persistAssistantMessage catches DB errors and continues.
Simplified current success path:
awaitpersistAssistantMessage(options.chatId,pendingAssistantResponse);// auto-commit / auto-PR / sandbox state updates may run here// and extend the window significantlyif(didUpdateGitData){awaitpersistAssistantMessage(options.chatId,pendingAssistantResponse);}awaitPromise.all([clearActiveStream(options.chatId,workflowRunId),sendFinish(writable).then(()=>closeStream(writable)),]);
Failure mode 1 — Duplicate final response on refresh
Between the last successful assistant persist and clearActiveStream(...):
the page loads initialMessages from DB, including the final assistant message
chat.activeStreamId is still non-null, so the client resumes /api/chat/{id}/stream
the durable workflow stream replays the run on top of the already-hydrated assistant message
Failure mode 2 — Final assistant state can be silently lost
persistAssistantMessage(...) currently catches DB errors and only logs them.
If the final assistant persist fails, but the workflow still proceeds to clearActiveStream(...) and sends finish, the chat can end up in this state:
no final assistant snapshot in DB
activeStreamId already cleared
the client already saw the response live and transitioned to a finished state
After a hard refresh, the final assistant response is missing from durable chat history and there is no active stream left to resume.
Reproduction
Natural path (no code changes)
Enable autoCommitEnabled and/or autoCreatePrEnabled on a chat with repo context.
In chat.ts the final assistant persist runs before the auto-commit / auto-PR steps, while clearActiveStream(...) runs after them — the window is the full duration of those steps (seconds to tens of seconds depending on repo size and network).
Send a prompt, wait until the assistant text is visible, then hard refresh during the auto-commit / auto-PR phase.
Observe the same assistant response rendered a second time on top of the DB-hydrated message.
Deterministic repro
Insert await new Promise(r => setTimeout(r, 3000)); immediately before the final Promise.all([ clearActiveStream(...), sendFinish(...), ... ]) in the workflow success path.
Send a prompt and wait until the assistant response is visible.
Hard refresh during the sleep.
Same duplicate rendering as above, reliably.
Silent-loss path
Force the final upsertChatMessageScoped(...) write to fail (e.g. inject a thrown error in persistAssistantMessage). The workflow still clears the slot and sends finish, and the final assistant message is not durable.
keep activeStreamId claimed until the last point the assistant message can still change (i.e. after any auto-commit / auto-PR data-part updates)
replace the final persistAssistantMessage(...) + clearActiveStream(...) pair with the transactional helper
after that transaction commits, emit finish and close the stream
keep bare clearActiveStream(...) only as a fallback for abort / error paths where there is no final assistant snapshot to persist
This also makes the silent-loss path benign: if the transaction fails, activeStreamId stays set and the client can resume on refresh instead of landing in a "no message, no stream" terminal state.
Relation to existing issues
This is narrower than #526 (rethink long-running stream replay) and orthogonal to #545 (reasoning-part duplication on resume):
Summary
In
apps/web/app/workflows/chat.ts, the current success path onmainpersists the final assistant snapshot before clearingchats.activeStreamId. Those are two separate DB writes with no ordering guarantee beyond control flow:activeStreamIdstill points at the finishing workflow runThat intermediate state is observable by the client on refresh and produces a user-visible duplicate-response bug. It also leaves a silent-loss path because
persistAssistantMessagecatches DB errors and continues.Simplified current success path:
Failure mode 1 — Duplicate final response on refresh
Between the last successful assistant persist and
clearActiveStream(...):initialMessagesfrom DB, including the final assistant messagechat.activeStreamIdis still non-null, so the client resumes/api/chat/{id}/streamFailure mode 2 — Final assistant state can be silently lost
persistAssistantMessage(...)currently catches DB errors and only logs them.If the final assistant persist fails, but the workflow still proceeds to
clearActiveStream(...)and sendsfinish, the chat can end up in this state:activeStreamIdalready clearedAfter a hard refresh, the final assistant response is missing from durable chat history and there is no active stream left to resume.
Reproduction
Natural path (no code changes)
autoCommitEnabledand/orautoCreatePrEnabledon a chat with repo context.chat.tsthe final assistant persist runs before the auto-commit / auto-PR steps, whileclearActiveStream(...)runs after them — the window is the full duration of those steps (seconds to tens of seconds depending on repo size and network).Deterministic repro
await new Promise(r => setTimeout(r, 3000));immediately before the finalPromise.all([ clearActiveStream(...), sendFinish(...), ... ])in the workflow success path.Silent-loss path
Force the final
upsertChatMessageScoped(...)write to fail (e.g. inject a thrown error inpersistAssistantMessage). The workflow still clears the slot and sendsfinish, and the final assistant message is not durable.Why ordering alone doesn't fix this
Swapping the order just moves the bad window:
persistfirst,clearlater: duplicate-on-refresh windowclearfirst,persistlater: missing-message / silent-loss window if persist failsThe invariant that matters is:
Only a single DB transaction closes that gap.
Proposed fix
Add a helper that atomically upserts the final assistant snapshot and clears the active stream slot with CAS semantics:
Then in the workflow:
activeStreamIdclaimed until the last point the assistant message can still change (i.e. after any auto-commit / auto-PR data-part updates)persistAssistantMessage(...)+clearActiveStream(...)pair with the transactional helperfinishand close the streamclearActiveStream(...)only as a fallback for abort / error paths where there is no final assistant snapshot to persistThis also makes the silent-loss path benign: if the transaction fails,
activeStreamIdstays set and the client can resume on refresh instead of landing in a "no message, no stream" terminal state.Relation to existing issues
This is narrower than #526 (rethink long-running stream replay) and orthogonal to #545 (reasoning-part duplication on resume):
rs_*reasoning) being duplicated across replay.A single DB transaction is sufficient and independent of the transport-level decisions in #526 / #545.
Notes
Observed and verified locally on
main. Happy to turn the fix into a PR if this aligns with your preferred direction.