fix(realtime): surface async postgres_changes system error as channelError#1470
Merged
spydon merged 1 commit intoJun 25, 2026
Conversation
spydon
approved these changes
Jun 25, 2026
spydon
left a comment
Contributor
There was a problem hiding this comment.
LGTM, thanks for your contribution!
…Error A `postgres_changes` channel can report `RealtimeSubscribeStatus.subscribed` and stay joined while delivering zero events, indefinitely, when the server-side replication subscription fails *after* the phoenix join succeeds (e.g. RLS denies under a stale/expired access token, or the server declines). That failure arrives as an asynchronous `system` event with `status: error`, which the SDK only exposes via `onSystemEvents` (documented "for debugging purposes") and never forwards to the `.subscribe()` status callback. The result is a silent "zombie" channel: green/connected to the app, but dead. The join reply optimistically echoes the requested `postgres_changes` config (with server-assigned binding ids), so binding reconciliation succeeds and `subscribed` fires before the replication is confirmed live. The real verdict arrives on the later `system` event, which the high-level API drops. Forward the `system` error to the subscribe callback as `channelError` so a failed `postgres_changes` subscription becomes a detectable, retryable error rather than a silent zombie. The happy path (`system status: ok`) is ignored, so subscriptions that succeed are unaffected. Closes supabase#1466 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b324ebd to
9cf0df7
Compare
Contributor
Author
My pleasure! Glad to be a part of the community! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1466.
A
postgres_changeschannel can reportRealtimeSubscribeStatus.subscribedand stay joined while delivering zero events, indefinitely, when the server-side replication subscription fails after the phoenix join succeeds. The failure is delivered as an asynchronoussystemevent withstatus: error, but the SDK only exposes that viaonSystemEvents(...)(documented "for debugging purposes") and never forwards it to the.subscribe()status callback — a silent "zombie" channel: green/connected to the app, but dead.This is especially painful on reconnect/resume flows where the access token may be stale (e.g. the refresh timer was throttled while the app was idle/backgrounded). The channel re-subscribes, the join succeeds, the app marks it healthy, and it silently delivers nothing until the next reconnect. It bit us as a user-facing incident: live data silently stopped updating after a reconnect.
Root cause
postgres_changesis an extension whose replication subscription is set up asynchronously by the server, after the channel join..subscribe()'s status is driven by the phoenix join (joined/errored/closed/timeout), and there is noRealtimeSubscribeStatusvalue for "joined, but the extension subscription failed." The join reply optimistically echoes the requestedpostgres_changesconfig (with server-assigned binding ids), so binding reconciliation succeeds andsubscribedfires before the replication is confirmed live. The real verdict arrives on a latersystem status=errorevent, which the high-level API drops on the floor.Wire sequence with a stale/expired token (
auth.uid()no longer satisfies the policy):Change
In
RealtimeChannel.subscribe, register anonSystemEventslistener that forwardspayload['status'] == 'error'to the subscribe callback asRealtimeSubscribeStatus.channelError(carrying the server's message). This is the direction suggested in #1466 and approved by @spydon.postgres_changessubscription becomes a detectable, retryable error instead of a silent zombie.system status: ok) is ignored, so subscriptions that succeed are unaffected.channelError.Tests
Added an
onSystemEventsgroup tochannel_test.dart:system status=errorto the subscribe callback aschannelError, carrying the server message;system status=okevent is not surfaced as an error (happy path unaffected).Verification