Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -775,8 +775,6 @@ npm run rollback -- <org> --list # List available snapshots
# Testing
npm run call -- <org> -a <assistant-name> # Call an assistant via WebSocket
npm run call -- <org> -s <squad-name> # Call a squad via WebSocket
npm run eval -- <org> -s <squad-name> # Run evals against a squad
npm run eval -- <org> -a <assistant-name> # Run evals against an assistant

# Maintenance
npm run cleanup -- <org> # Dry-run: show orphaned remote resources
Expand Down
29 changes: 0 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ Every command works in two modes:
| `npm run apply` | ✅ | `npm run apply -- <org> [--force]` | Pull → Merge → Push in one shot |
| `npm run call` | ✅ | `npm run call -- <org> -a <name>` | Start a WebSocket call |
| `npm run cleanup` | ✅ | `npm run cleanup -- <org> [--force --confirm <org>]` | Delete orphaned remote resources (destructive run requires `--confirm <org>`) |
| `npm run eval` | — | `npm run eval -- <org> -s <squad>` | Run evals against an assistant/squad |
| `npm run build` | — | — | Type-check the codebase |
| `npm test` | — | — | Run regression tests (`node:test`) |

Expand Down Expand Up @@ -142,10 +141,6 @@ npm run call -- my-org -a my-assistant

# Call a squad
npm run call -- my-org -s my-squad

# Run evals
npm run eval -- my-org -s my-squad
npm run eval -- my-org -a my-assistant --filter booking
```

---
Expand Down Expand Up @@ -282,29 +277,6 @@ Squad push
└─ all references resolved → create the squad ✓
```

### Running Evals

Evals run mock conversations against an assistant or squad and check assertions.

```bash
# Run all evals against a squad (transient — loaded from local files)
npm run eval -- my-org -s my-squad

# Run a specific eval by name filter
npm run eval -- my-org -a my-assistant --filter booking

# Use stored assistant/squad IDs from state (already pushed)
npm run eval -- my-org -s my-squad --stored

# Load assistant from a specific file path
npm run eval -- my-org -a resources/my-org/assistants/qa-tester.md

# Provide variable overrides
npm run eval -- my-org -s my-squad -v eval-variables.json
```

Evals must be pushed first (`npm run push -- my-org evals`). Eval definitions live in `resources/<org>/evals/*.yml`.

---

## File Formats
Expand Down Expand Up @@ -536,7 +508,6 @@ vapi-gitops/
│ ├── push.ts # Push local state to platform
│ ├── apply.ts # Orchestrator: pull → merge → push
│ ├── call.ts # WebSocket call script
│ ├── eval.ts # Eval runner
│ ├── cleanup.ts # Orphan cleanup
│ ├── pull-cmd.ts # Entry point: interactive or direct pull
│ ├── push-cmd.ts # Entry point: interactive or direct push
Expand Down
1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
"pull": "tsx src/pull-cmd.ts",
"call": "bash -c 'exec tsx src/call-cmd.ts \"$@\" 2> >(grep --line-buffered -v \"buffer underflow\" >&2)' --",
"cleanup": "tsx src/cleanup-cmd.ts",
"eval": "tsx src/eval.ts",
"validate": "tsx src/validate-cmd.ts",
"sim": "tsx src/sim-cmd.ts",
"rollback": "tsx src/rollback-cmd.ts",
Expand Down
Loading