Skip to content

fix(realtime): surface async postgres_changes system error as channelError#1470

Merged
spydon merged 1 commit into
supabase:mainfrom
DayLight-Creative-Technologies:fix/realtime-postgres-changes-system-error
Jun 25, 2026
Merged

fix(realtime): surface async postgres_changes system error as channelError#1470
spydon merged 1 commit into
supabase:mainfrom
DayLight-Creative-Technologies:fix/realtime-postgres-changes-system-error

Conversation

@daylightcreative

Copy link
Copy Markdown
Contributor

Summary

Fixes #1466.

A postgres_changes channel can report RealtimeSubscribeStatus.subscribed and stay joined while delivering zero events, indefinitely, when the server-side replication subscription fails after the phoenix join succeeds. The failure is delivered as an asynchronous system event with status: error, but the SDK only exposes that via onSystemEvents(...) (documented "for debugging purposes") and never forwards it to the .subscribe() status callback — a silent "zombie" channel: green/connected to the app, but dead.

This is especially painful on reconnect/resume flows where the access token may be stale (e.g. the refresh timer was throttled while the app was idle/backgrounded). The channel re-subscribes, the join succeeds, the app marks it healthy, and it silently delivers nothing until the next reconnect. It bit us as a user-facing incident: live data silently stopped updating after a reconnect.

Root cause

postgres_changes is an extension whose replication subscription is set up asynchronously by the server, after the channel join. .subscribe()'s status is driven by the phoenix join (joined/errored/closed/timeout), and there is no RealtimeSubscribeStatus value for "joined, but the extension subscription failed." The join reply optimistically echoes the requested postgres_changes config (with server-assigned binding ids), so binding reconciliation succeeds and subscribed fires before the replication is confirmed live. The real verdict arrives on a later system status=error event, which the high-level API drops on the floor.

Wire sequence with a stale/expired token (auth.uid() no longer satisfies the policy):

<- phx_reply   status=ok    response={postgres_changes:[{id, event:INSERT, schema, table, filter}]}
   // subscribe() callback fires RealtimeSubscribeStatus.subscribed here
<- system      status=error {message:"Unable to subscribe to changes with given parameters ... ERROR P0001 ..."}
   // ^ swallowed -- never reaches the subscribe() callback. No rows ever arrive.

Change

In RealtimeChannel.subscribe, register an onSystemEvents listener that forwards payload['status'] == 'error' to the subscribe callback as RealtimeSubscribeStatus.channelError (carrying the server's message). This is the direction suggested in #1466 and approved by @spydon.

  • A failed postgres_changes subscription becomes a detectable, retryable error instead of a silent zombie.
  • The happy path (system status: ok) is ignored, so subscriptions that succeed are unaffected.
  • No public API change and no new enum value — it reuses the existing channelError.

Tests

Added an onSystemEvents group to channel_test.dart:

  • forwards a system status=error to the subscribe callback as channelError, carrying the server message;
  • falls back to a default message when the error payload has none;
  • a system status=ok event is not surfaced as an error (happy path unaffected).

Verification

dart format lib test -l 80 --set-exit-if-changed   # clean
dart analyze --fatal-warnings .                     # No issues found!
dart test test/channel_test.dart                    # 44/44 passed

@daylightcreative daylightcreative requested a review from a team as a code owner June 25, 2026 13:27

@spydon spydon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution!

…Error

A `postgres_changes` channel can report `RealtimeSubscribeStatus.subscribed`
and stay joined while delivering zero events, indefinitely, when the
server-side replication subscription fails *after* the phoenix join succeeds
(e.g. RLS denies under a stale/expired access token, or the server declines).
That failure arrives as an asynchronous `system` event with `status: error`,
which the SDK only exposes via `onSystemEvents` (documented "for debugging
purposes") and never forwards to the `.subscribe()` status callback. The
result is a silent "zombie" channel: green/connected to the app, but dead.

The join reply optimistically echoes the requested `postgres_changes` config
(with server-assigned binding ids), so binding reconciliation succeeds and
`subscribed` fires before the replication is confirmed live. The real verdict
arrives on the later `system` event, which the high-level API drops.

Forward the `system` error to the subscribe callback as `channelError` so a
failed `postgres_changes` subscription becomes a detectable, retryable error
rather than a silent zombie. The happy path (`system status: ok`) is ignored,
so subscriptions that succeed are unaffected.

Closes supabase#1466

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@daylightcreative daylightcreative force-pushed the fix/realtime-postgres-changes-system-error branch from b324ebd to 9cf0df7 Compare June 25, 2026 16:01
@daylightcreative

Copy link
Copy Markdown
Contributor Author

LGTM, thanks for your contribution!

My pleasure! Glad to be a part of the community!

@spydon spydon merged commit d127170 into supabase:main Jun 25, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants