diff --git a/docs/self-hosting/docker.mdx b/docs/self-hosting/docker.mdx index df9ae246c2d..665d6fc5875 100644 --- a/docs/self-hosting/docker.mdx +++ b/docs/self-hosting/docker.mdx @@ -189,6 +189,7 @@ docker compose up -d To create additional worker groups beyond the bootstrap group, use the admin API endpoint. This requires admin privileges. **Making a user admin:** + - **New users**: Set `ADMIN_EMAILS` environment variable (regex pattern) before user creation. - **Existing users**: Set `admin = true` in the `user` table in your database. @@ -341,6 +342,18 @@ By default, the images will point at the latest versioned release via the `lates TRIGGER_IMAGE_TAG=v4.0.0 ``` +## Task events + +By default, task events (timeline, logs, spans) are stored in PostgreSQL. For production deployments we recommend storing them in ClickHouse instead, it scales to much higher volumes and avoids unbounded growth of the `TaskEvent` table. + +To enable, set on the webapp in your `.env`: + +```bash +EVENT_REPOSITORY_DEFAULT_STORE=clickhouse_v2 +``` + +This only affects new runs; existing runs continue to read from wherever their events were originally stored. + ## Troubleshooting - **Deployment fails at the push step.** The machine running `deploy` needs registry access. See the [registry setup](#registry-setup) section for more details. @@ -359,7 +372,9 @@ TRIGGER_IMAGE_TAG=v4.0.0 - **ClickHouse migrations say "no migrations to run" but schema is missing.** The goose migration tracker is out of sync. Exec into the webapp container, set the GOOSE env vars (from webapp startup logs), and run `goose reset && goose up`. - **Data Loss Warning:** The `goose reset` command is destructive and will drop the entire schema. Make sure to backup your data and confirm you are running this in a non-production environment before executing this command. + **Data Loss Warning:** The `goose reset` command is destructive and will drop the entire schema. + Make sure to backup your data and confirm you are running this in a non-production environment + before executing this command. ## CLI usage diff --git a/docs/self-hosting/env/webapp.mdx b/docs/self-hosting/env/webapp.mdx index 3a45bf7b04b..17ce851753b 100644 --- a/docs/self-hosting/env/webapp.mdx +++ b/docs/self-hosting/env/webapp.mdx @@ -123,6 +123,8 @@ mode: "wide" | `TRIGGER_OTEL_ATTRIBUTE_PER_LINK_COUNT_LIMIT` | No | 10 | OTel attribute per link count limit. | | `TRIGGER_OTEL_ATTRIBUTE_PER_EVENT_COUNT_LIMIT` | No | 10 | OTel attribute per event count limit. | | `SERVER_OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT` | No | 8192 | OTel span attribute value length limit. | +| **Task events** | | | | +| `EVENT_REPOSITORY_DEFAULT_STORE` | No | postgres | Where to store task events. Set to `clickhouse_v2` to store in ClickHouse (recommended for production). | | **Realtime** | | | | | `REALTIME_STREAM_MAX_LENGTH` | No | 1000 | Realtime stream max length. | | `REALTIME_STREAM_TTL` | No | 86400 (1d) | Realtime stream TTL (s). | diff --git a/docs/self-hosting/kubernetes.mdx b/docs/self-hosting/kubernetes.mdx index eba66f4ed07..8fbf8aab90a 100644 --- a/docs/self-hosting/kubernetes.mdx +++ b/docs/self-hosting/kubernetes.mdx @@ -354,6 +354,27 @@ webapp: - Compatible with secret management tools (External Secrets Operator, etc.) - Follows Kubernetes security best practices +## DNS performance + +For production clusters we recommend deploying [NodeLocal DNSCache](https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/). DNS queries — especially to managed Postgres or Redis endpoints — can be very slow under Kubernetes' default resolver, and a node-local cache typically gives a large step change in latency and throughput across the cluster. + +The default `ndots: 5` setting also forces every cluster search domain to be tried before resolving hostnames with fewer dots (the case for most external database hosts). Lowering `ndots` to `1` on the webapp and supervisor pods avoids those extra round-trips. + +## Task events + +By default, task events (timeline, logs, spans) are stored in PostgreSQL. For production deployments we recommend storing them in ClickHouse instead, it scales to much higher volumes and avoids unbounded growth of the `TaskEvent` table. + +ClickHouse is already deployed by the chart, so no extra services are required. To enable, set `EVENT_REPOSITORY_DEFAULT_STORE` on the webapp via `extraEnvVars`: + +```yaml +webapp: + extraEnvVars: + - name: EVENT_REPOSITORY_DEFAULT_STORE + value: "clickhouse_v2" +``` + +This only affects new runs; existing runs continue to read from wherever their events were originally stored. + ## Worker token When using the default bootstrap configuration, worker creation and authentication is handled automatically. The webapp generates a worker token and makes it available to the supervisor via a shared volume.