Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion docs/self-hosting/docker.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ docker compose up -d
To create additional worker groups beyond the bootstrap group, use the admin API endpoint. This requires admin privileges.

**Making a user admin:**

- **New users**: Set `ADMIN_EMAILS` environment variable (regex pattern) before user creation.
- **Existing users**: Set `admin = true` in the `user` table in your database.

Expand Down Expand Up @@ -341,6 +342,18 @@ By default, the images will point at the latest versioned release via the `lates
TRIGGER_IMAGE_TAG=v4.0.0
```

## Task events

By default, task events (timeline, logs, spans) are stored in PostgreSQL. For production deployments we recommend storing them in ClickHouse instead, it scales to much higher volumes and avoids unbounded growth of the `TaskEvent` table.

To enable, set on the webapp in your `.env`:

```bash
EVENT_REPOSITORY_DEFAULT_STORE=clickhouse_v2
```

This only affects new runs; existing runs continue to read from wherever their events were originally stored.

## Troubleshooting

- **Deployment fails at the push step.** The machine running `deploy` needs registry access. See the [registry setup](#registry-setup) section for more details.
Expand All @@ -359,7 +372,9 @@ TRIGGER_IMAGE_TAG=v4.0.0
- **ClickHouse migrations say "no migrations to run" but schema is missing.** The goose migration tracker is out of sync. Exec into the webapp container, set the GOOSE env vars (from webapp startup logs), and run `goose reset && goose up`.

<Warning>
**Data Loss Warning:** The `goose reset` command is destructive and will drop the entire schema. Make sure to backup your data and confirm you are running this in a non-production environment before executing this command.
**Data Loss Warning:** The `goose reset` command is destructive and will drop the entire schema.
Make sure to backup your data and confirm you are running this in a non-production environment
before executing this command.
</Warning>

## CLI usage
Expand Down
2 changes: 2 additions & 0 deletions docs/self-hosting/env/webapp.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ mode: "wide"
| `TRIGGER_OTEL_ATTRIBUTE_PER_LINK_COUNT_LIMIT` | No | 10 | OTel attribute per link count limit. |
| `TRIGGER_OTEL_ATTRIBUTE_PER_EVENT_COUNT_LIMIT` | No | 10 | OTel attribute per event count limit. |
| `SERVER_OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT` | No | 8192 | OTel span attribute value length limit. |
| **Task events** | | | |
| `EVENT_REPOSITORY_DEFAULT_STORE` | No | postgres | Where to store task events. Set to `clickhouse_v2` to store in ClickHouse (recommended for production). |
Comment thread
isshaddad marked this conversation as resolved.
| **Realtime** | | | |
| `REALTIME_STREAM_MAX_LENGTH` | No | 1000 | Realtime stream max length. |
| `REALTIME_STREAM_TTL` | No | 86400 (1d) | Realtime stream TTL (s). |
Expand Down
21 changes: 21 additions & 0 deletions docs/self-hosting/kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,27 @@ webapp:
- Compatible with secret management tools (External Secrets Operator, etc.)
- Follows Kubernetes security best practices

## DNS performance

For production clusters we recommend deploying [NodeLocal DNSCache](https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/). DNS queries — especially to managed Postgres or Redis endpoints — can be very slow under Kubernetes' default resolver, and a node-local cache typically gives a large step change in latency and throughput across the cluster.

The default `ndots: 5` setting also forces every cluster search domain to be tried before resolving hostnames with fewer dots (the case for most external database hosts). Lowering `ndots` to `1` on the webapp and supervisor pods avoids those extra round-trips.

## Task events

By default, task events (timeline, logs, spans) are stored in PostgreSQL. For production deployments we recommend storing them in ClickHouse instead, it scales to much higher volumes and avoids unbounded growth of the `TaskEvent` table.

ClickHouse is already deployed by the chart, so no extra services are required. To enable, set `EVENT_REPOSITORY_DEFAULT_STORE` on the webapp via `extraEnvVars`:

```yaml
webapp:
extraEnvVars:
- name: EVENT_REPOSITORY_DEFAULT_STORE
value: "clickhouse_v2"
```

This only affects new runs; existing runs continue to read from wherever their events were originally stored.

## Worker token

When using the default bootstrap configuration, worker creation and authentication is handled automatically. The webapp generates a worker token and makes it available to the supervisor via a shared volume.
Expand Down