Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 27 additions & 10 deletions apps/docs/content/guides/database/replication/bigquery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,22 @@ Before configuring BigQuery as a destination, set up the following in Google Clo
3. **GCP service account key**: Create a [service account](https://cloud.google.com/iam/docs/keys-create-delete) with appropriate permissions
- Go to **IAM & Admin > Service Accounts**
- Click **Create Service Account**
- Grant the "BigQuery Data Editor" role
- Grant the "BigQuery Data Editor" and "BigQuery Job User" roles
- Create and download the JSON key file

Required permissions:

- `bigquery.datasets.get`
- `bigquery.jobs.create`
- `bigquery.tables.create`
- `bigquery.tables.delete`
- `bigquery.tables.get`
- `bigquery.tables.getData`
- `bigquery.tables.list`
- `bigquery.tables.update`
- `bigquery.tables.updateData`
- `bigquery.routines.get`
- `bigquery.routines.list`

## Configure BigQuery as a destination

Expand Down Expand Up @@ -89,15 +94,21 @@ BigQuery replication requires each source table to have a primary key, and the p

BigQuery primary keys are `NOT ENFORCED`, and BigQuery CDC supports composite primary keys with up to 16 columns. Your source primary key must stay unique and non-null because BigQuery uses it to match CDC rows.

Source tables must also use a BigQuery-compatible Postgres `REPLICA IDENTITY` setting:
Source tables must also use a BigQuery-compatible Postgres `REPLICA IDENTITY` setting. Most tables can keep the Postgres default, as long as they have a primary key and all primary-key columns are included in the publication.

- `DEFAULT` with a primary key is supported and works for most tables.
- `FULL` is supported and is recommended for tables with large `text`, `jsonb`, `bytea`, or other values that Postgres may store out-of-line using TOAST.
- `USING INDEX`, `NOTHING`, or `DEFAULT` without a primary key are not supported for BigQuery replication.
| Source table setting | BigQuery support | Guidance |
| ------------------------------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `REPLICA IDENTITY DEFAULT` with a primary key | Supported | Recommended for most tables. BigQuery uses the replicated source primary key to apply upserts and deletes. |
| `REPLICA IDENTITY FULL` | Supported | Recommended for tables with large `text`, `jsonb`, `bytea`, or other values that Postgres may store out-of-line using TOAST, especially when those rows update. |
| `REPLICA IDENTITY USING INDEX` | Not supported | BigQuery CDC rows are keyed by the source primary key, not by an alternative unique index. |
| `REPLICA IDENTITY NOTHING` | Not supported | Updates and deletes do not include enough row identity for BigQuery to apply them safely. |
| `REPLICA IDENTITY DEFAULT` without a primary key | Not supported | BigQuery requires a source primary key. |

For a general explanation of how replica identity affects update and delete events, see [How does replica identity affect updates and deletes?](/docs/guides/database/replication/external-replication-faq#how-does-replica-identity-affect-updates-and-deletes).

For updates, Postgres does not always send a complete old row through logical replication. It can also mark unchanged toasted values as `unchanged toast` instead of resending the value. The replication pipeline can reconstruct a complete update when the old row image contains the missing value, which is reliable with `REPLICA IDENTITY FULL`. BigQuery CDC upserts require a complete new row, so updates can fail for tables with toasted columns if the pipeline receives only a partial update row.
For updates, Postgres does not always send a complete old row through logical replication. It can also mark unchanged toasted values as `unchanged toast` instead of resending the value. BigQuery CDC upserts require a complete new row because omitted columns are not preserved in the destination. The replication pipeline can reconstruct a complete update when the old row image contains the missing value, which is reliable with `REPLICA IDENTITY FULL`.

If a BigQuery pipeline fails with an error about a partial update row, set `REPLICA IDENTITY FULL` on the affected source table and restart the pipeline. Changing replica identity only affects new WAL records, so a retained update that was written before the change may still need to be skipped by recreating the pipeline or re-copying the affected table.

Check a table's current replica identity:

Expand Down Expand Up @@ -138,16 +149,22 @@ Schema change support for BigQuery is currently in beta. External replication su

Supported schema changes:

- Adding a column
- Adding a nullable column
- Removing a column
- Renaming a column
- Dropping a `NOT NULL` constraint
- Setting or dropping supported column default metadata

Unsupported schema changes:
Unsupported or limited schema changes:

- Changing a column's data type
- Replicating column default values
- Adding `NOT NULL` with `SET NOT NULL`
- Filling existing rows for `ADD COLUMN ... DEFAULT`
- Unsupported default expressions

BigQuery requires added columns to be nullable. When a replicated `ADD COLUMN` includes a default, external replication can apply supported default metadata for future rows, but BigQuery does not backfill existing rows through that DDL. Existing destination rows remain `NULL` unless you run a separate backfill.

We plan to expand schema change support over time as the feature evolves.
Supported defaults are best-effort translations to BigQuery SQL. Unsupported defaults are skipped with a warning instead of failing replication.

## Limitations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,13 @@ Schema change support is currently in beta and limited to the BigQuery destinati

Supported schema changes:

- Adding a column
- Adding a nullable column
- Removing a column
- Renaming a column
- Dropping a `NOT NULL` constraint
- Setting or dropping supported column default metadata

External replication does not currently support changing column data types or replicating column default values. See [BigQuery schema change support](/docs/guides/database/replication/bigquery#schema-change-support) for details.
External replication does not currently support changing column data types, adding `NOT NULL` with `SET NOT NULL`, or filling existing destination rows for `ADD COLUMN ... DEFAULT`. Supported defaults are applied as destination metadata for future rows where BigQuery can represent them. See [BigQuery schema change support](/docs/guides/database/replication/bigquery#schema-change-support) for details.

## What happens when you disable external replication?

Expand All @@ -69,6 +71,14 @@ Common reasons:

Check your publication settings and verify your table meets the requirements.

## Why are partitioned tables replicated as separate tables?

Postgres controls this with the publication's `publish_via_partition_root` setting. If the setting is `false`, or if you created the publication manually with SQL and did not set it, Postgres publishes changes from the leaf partitions. External replication then creates destination tables for those leaf partitions. If `publish_via_partition_root = true`, Postgres publishes changes as the partition root, so external replication treats the partition hierarchy as the published partition root.

Publications created from the Dashboard replication flow use `publish_via_partition_root = true`.

See [Partitioned tables](/docs/guides/database/replication/external-replication-setup#partitioned-tables) for examples and the full behavior.

## How does replica identity affect updates and deletes?

If inserts replicate but updates or deletes fail, check the table's `REPLICA IDENTITY` setting.
Expand Down Expand Up @@ -132,6 +142,20 @@ Pipeline failures occur during the streaming phase when an error happens while r

See [Handling errors](/docs/guides/database/replication/external-replication-monitoring#handling-errors) for more details.

## Why is replication lag increasing?

Lag increases when Postgres produces WAL faster than the pipeline can confirm it has processed. Common causes include a slow or rate-limited destination, a pipeline issue, heavy source database activity, long transactions, network latency between the pipeline and source database, or a stopped/disconnected pipeline.

Open [**Database > Replication**](/dashboard/project/_/database/replication), click **View status**, and check **Waiting to sync**, **Room before pausing**, **Last check-in**, **Connected**, and **Slot status**. See [Dealing with replication lag](/docs/guides/database/replication/external-replication-monitoring#dealing-with-replication-lag) for the full investigation and response flow.

## What does a `Lost` slot status mean?

`Lost` means Postgres has already removed WAL files that the pipeline's replication slot needed. The pipeline cannot continue from that slot.

You can recreate the pipeline, or open the pipeline's **Advanced settings**, set **Invalidated slot behavior** to **Recreate**, and restart the pipeline. On restart, the pipeline creates a new replication slot and starts replication from scratch for all tables. This is required for consistency because the old slot can no longer provide every change the pipeline missed.

See [Slot statuses](/docs/guides/database/replication/external-replication-monitoring#slot-statuses) for all slot states and what to do next.

## Why is a table in error state?

Table errors occur during the copy phase. To recover, click **View status**, find the affected table, and reset the table state. This will restart the table copy from the beginning.
Expand Down
Loading
Loading