Skip to content

Commit c06a8bb

Browse files
authored
feat: add examples auto-run skill and refresh example scripts (#2303)
see also: openai/openai-agents-js#848
1 parent a4bd62a commit c06a8bb

35 files changed

Lines changed: 953 additions & 115 deletions

File tree

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
name: examples-auto-run
3+
description: Run python examples in auto mode with logging, rerun helpers, and background control.
4+
---
5+
6+
# examples-auto-run
7+
8+
## What it does
9+
10+
- Runs `uv run examples/run_examples.py` with:
11+
- `EXAMPLES_INTERACTIVE_MODE=auto` (auto-input/auto-approve).
12+
- Per-example logs under `.tmp/examples-start-logs/`.
13+
- Main summary log path passed via `--main-log` (also under `.tmp/examples-start-logs/`).
14+
- Generates a rerun list of failures at `.tmp/examples-rerun.txt` when `--write-rerun` is set.
15+
- Provides start/stop/status/logs/tail/collect/rerun helpers via `run.sh`.
16+
- Background option keeps the process running with a pidfile; `stop` cleans it up.
17+
18+
## Usage
19+
20+
```bash
21+
# Start (auto mode; interactive included by default)
22+
.codex/skills/examples-auto-run/scripts/run.sh start [extra args to run_examples.py]
23+
# Examples:
24+
.codex/skills/examples-auto-run/scripts/run.sh start --filter basic
25+
.codex/skills/examples-auto-run/scripts/run.sh start --include-server --include-audio
26+
27+
# Check status
28+
.codex/skills/examples-auto-run/scripts/run.sh status
29+
30+
# Stop running job
31+
.codex/skills/examples-auto-run/scripts/run.sh stop
32+
33+
# List logs
34+
.codex/skills/examples-auto-run/scripts/run.sh logs
35+
36+
# Tail latest log (or specify one)
37+
.codex/skills/examples-auto-run/scripts/run.sh tail
38+
.codex/skills/examples-auto-run/scripts/run.sh tail main_20260113-123000.log
39+
40+
# Collect rerun list from a main log (defaults to latest main_*.log)
41+
.codex/skills/examples-auto-run/scripts/run.sh collect
42+
43+
# Rerun only failed entries from rerun file (auto mode)
44+
.codex/skills/examples-auto-run/scripts/run.sh rerun
45+
```
46+
47+
## Defaults (overridable via env)
48+
49+
- `EXAMPLES_INTERACTIVE_MODE=auto`
50+
- `EXAMPLES_INCLUDE_INTERACTIVE=1`
51+
- `EXAMPLES_INCLUDE_SERVER=0`
52+
- `EXAMPLES_INCLUDE_AUDIO=0`
53+
- `EXAMPLES_INCLUDE_EXTERNAL=0`
54+
- Auto-approvals in auto mode: `APPLY_PATCH_AUTO_APPROVE=1`, `SHELL_AUTO_APPROVE=1`, `AUTO_APPROVE_MCP=1`
55+
56+
## Log locations
57+
58+
- Main logs: `.tmp/examples-start-logs/main_*.log`
59+
- Per-example logs (from `run_examples.py`): `.tmp/examples-start-logs/<module_path>.log`
60+
- Rerun list: `.tmp/examples-rerun.txt`
61+
- Stdout logs: `.tmp/examples-start-logs/stdout_*.log`
62+
63+
## Notes
64+
65+
- The runner delegates to `uv run examples/run_examples.py`, which already writes per-example logs and supports `--collect`, `--rerun-file`, and `--print-auto-skip`.
66+
- `start` uses `--write-rerun` so failures are captured automatically.
67+
- If `.tmp/examples-rerun.txt` exists and is non-empty, invoking the skill with no args runs `rerun` by default.
68+
69+
## Behavioral validation (Codex/LLM responsibility)
70+
71+
The runner does not perform any automated behavioral validation. After every foreground `start` or `rerun`, **Codex must manually validate** all exit-0 entries:
72+
73+
1. Read the example source (and comments) to infer intended flow, tools used, and expected key outputs.
74+
2. Open the matching per-example log under `.tmp/examples-start-logs/`.
75+
3. Confirm the intended actions/results occurred; flag omissions or divergences.
76+
4. Do this for **all passed examples**, not just a sample.
77+
5. Report immediately after the run with concise citations to the exact log lines that justify the validation.
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
5+
PID_FILE="$ROOT/.tmp/examples-auto-run.pid"
6+
LOG_DIR="$ROOT/.tmp/examples-start-logs"
7+
RERUN_FILE="$ROOT/.tmp/examples-rerun.txt"
8+
9+
ensure_dirs() {
10+
mkdir -p "$LOG_DIR" "$ROOT/.tmp"
11+
}
12+
13+
is_running() {
14+
local pid="$1"
15+
[[ -n "$pid" ]] && ps -p "$pid" >/dev/null 2>&1
16+
}
17+
18+
cmd_start() {
19+
ensure_dirs
20+
local background=0
21+
if [[ "${1:-}" == "--background" ]]; then
22+
background=1
23+
shift
24+
fi
25+
26+
local ts main_log stdout_log
27+
ts="$(date +%Y%m%d-%H%M%S)"
28+
main_log="$LOG_DIR/main_${ts}.log"
29+
stdout_log="$LOG_DIR/stdout_${ts}.log"
30+
31+
local run_cmd=(
32+
uv run examples/run_examples.py
33+
--auto-mode
34+
--write-rerun
35+
--main-log "$main_log"
36+
--logs-dir "$LOG_DIR"
37+
)
38+
39+
if [[ "$background" -eq 1 ]]; then
40+
if [[ -f "$PID_FILE" ]]; then
41+
local pid
42+
pid="$(cat "$PID_FILE" 2>/dev/null || true)"
43+
if is_running "$pid"; then
44+
echo "examples/run_examples.py already running (pid=$pid)."
45+
exit 1
46+
fi
47+
fi
48+
(
49+
trap '' HUP
50+
export EXAMPLES_INTERACTIVE_MODE="${EXAMPLES_INTERACTIVE_MODE:-auto}"
51+
export APPLY_PATCH_AUTO_APPROVE="${APPLY_PATCH_AUTO_APPROVE:-1}"
52+
export SHELL_AUTO_APPROVE="${SHELL_AUTO_APPROVE:-1}"
53+
export AUTO_APPROVE_MCP="${AUTO_APPROVE_MCP:-1}"
54+
export EXAMPLES_INCLUDE_INTERACTIVE="${EXAMPLES_INCLUDE_INTERACTIVE:-1}"
55+
export EXAMPLES_INCLUDE_SERVER="${EXAMPLES_INCLUDE_SERVER:-0}"
56+
export EXAMPLES_INCLUDE_AUDIO="${EXAMPLES_INCLUDE_AUDIO:-0}"
57+
export EXAMPLES_INCLUDE_EXTERNAL="${EXAMPLES_INCLUDE_EXTERNAL:-0}"
58+
cd "$ROOT"
59+
exec "${run_cmd[@]}" "$@" > >(tee "$stdout_log") 2>&1
60+
) &
61+
local pid=$!
62+
echo "$pid" >"$PID_FILE"
63+
echo "Started run_examples.py (pid=$pid)"
64+
echo "Main log: $main_log"
65+
echo "Stdout log: $stdout_log"
66+
echo "Run '.codex/skills/examples-auto-run/scripts/run.sh validate \"$main_log\"' after it finishes."
67+
return 0
68+
fi
69+
70+
export EXAMPLES_INTERACTIVE_MODE="${EXAMPLES_INTERACTIVE_MODE:-auto}"
71+
export APPLY_PATCH_AUTO_APPROVE="${APPLY_PATCH_AUTO_APPROVE:-1}"
72+
export SHELL_AUTO_APPROVE="${SHELL_AUTO_APPROVE:-1}"
73+
export AUTO_APPROVE_MCP="${AUTO_APPROVE_MCP:-1}"
74+
export EXAMPLES_INCLUDE_INTERACTIVE="${EXAMPLES_INCLUDE_INTERACTIVE:-1}"
75+
export EXAMPLES_INCLUDE_SERVER="${EXAMPLES_INCLUDE_SERVER:-0}"
76+
export EXAMPLES_INCLUDE_AUDIO="${EXAMPLES_INCLUDE_AUDIO:-0}"
77+
export EXAMPLES_INCLUDE_EXTERNAL="${EXAMPLES_INCLUDE_EXTERNAL:-0}"
78+
cd "$ROOT"
79+
set +e
80+
"${run_cmd[@]}" "$@" 2>&1 | tee "$stdout_log"
81+
local run_status=${PIPESTATUS[0]}
82+
set -e
83+
return "$run_status"
84+
}
85+
86+
cmd_stop() {
87+
if [[ ! -f "$PID_FILE" ]]; then
88+
echo "No pid file; nothing to stop."
89+
return 0
90+
fi
91+
local pid
92+
pid="$(cat "$PID_FILE" 2>/dev/null || true)"
93+
if [[ -z "$pid" ]]; then
94+
rm -f "$PID_FILE"
95+
echo "Pid file empty; cleaned."
96+
return 0
97+
fi
98+
if ! is_running "$pid"; then
99+
rm -f "$PID_FILE"
100+
echo "Process $pid not running; cleaned pid file."
101+
return 0
102+
fi
103+
echo "Stopping pid $pid ..."
104+
kill "$pid" 2>/dev/null || true
105+
sleep 1
106+
if is_running "$pid"; then
107+
echo "Sending SIGKILL to $pid ..."
108+
kill -9 "$pid" 2>/dev/null || true
109+
fi
110+
rm -f "$PID_FILE"
111+
echo "Stopped."
112+
}
113+
114+
cmd_status() {
115+
if [[ -f "$PID_FILE" ]]; then
116+
local pid
117+
pid="$(cat "$PID_FILE" 2>/dev/null || true)"
118+
if is_running "$pid"; then
119+
echo "Running (pid=$pid)"
120+
return 0
121+
fi
122+
fi
123+
echo "Not running."
124+
}
125+
126+
cmd_logs() {
127+
ensure_dirs
128+
ls -1t "$LOG_DIR"
129+
}
130+
131+
cmd_tail() {
132+
ensure_dirs
133+
local file="${1:-}"
134+
if [[ -z "$file" ]]; then
135+
file="$(ls -1t "$LOG_DIR" | head -n1)"
136+
fi
137+
if [[ -z "$file" ]]; then
138+
echo "No log files yet."
139+
exit 1
140+
fi
141+
tail -f "$LOG_DIR/$file"
142+
}
143+
144+
collect_rerun() {
145+
ensure_dirs
146+
local log_file="${1:-}"
147+
if [[ -z "$log_file" ]]; then
148+
log_file="$(ls -1t "$LOG_DIR"/main_*.log 2>/dev/null | head -n1)"
149+
fi
150+
if [[ -z "$log_file" ]] || [[ ! -f "$log_file" ]]; then
151+
echo "No main log file found."
152+
exit 1
153+
fi
154+
cd "$ROOT"
155+
uv run examples/run_examples.py --collect "$log_file" --output "$RERUN_FILE"
156+
}
157+
158+
cmd_rerun() {
159+
ensure_dirs
160+
local file="${1:-$RERUN_FILE}"
161+
if [[ ! -s "$file" ]]; then
162+
echo "Rerun list is empty: $file"
163+
exit 0
164+
fi
165+
local ts main_log stdout_log
166+
ts="$(date +%Y%m%d-%H%M%S)"
167+
main_log="$LOG_DIR/main_${ts}.log"
168+
stdout_log="$LOG_DIR/stdout_${ts}.log"
169+
cd "$ROOT"
170+
export EXAMPLES_INTERACTIVE_MODE="${EXAMPLES_INTERACTIVE_MODE:-auto}"
171+
export APPLY_PATCH_AUTO_APPROVE="${APPLY_PATCH_AUTO_APPROVE:-1}"
172+
export SHELL_AUTO_APPROVE="${SHELL_AUTO_APPROVE:-1}"
173+
export AUTO_APPROVE_MCP="${AUTO_APPROVE_MCP:-1}"
174+
set +e
175+
uv run examples/run_examples.py --auto-mode --rerun-file "$file" --write-rerun --main-log "$main_log" --logs-dir "$LOG_DIR" 2>&1 | tee "$stdout_log"
176+
local run_status=${PIPESTATUS[0]}
177+
set -e
178+
return "$run_status"
179+
}
180+
181+
usage() {
182+
cat <<'EOF'
183+
Usage: run.sh <start|stop|status|logs|tail|collect|rerun> [args...]
184+
185+
Commands:
186+
start [--filter ... | other args] Run examples in auto mode (foreground). Pass --background to run detached.
187+
stop Kill the running auto-run (if any).
188+
status Show whether it is running.
189+
logs List log files (.tmp/examples-start-logs).
190+
tail [logfile] Tail the latest (or specified) log.
191+
collect [main_log] Parse a main log and write failed examples to .tmp/examples-rerun.txt.
192+
rerun [rerun_file] Run only the examples listed in .tmp/examples-rerun.txt.
193+
194+
Environment overrides:
195+
EXAMPLES_INTERACTIVE_MODE (default auto)
196+
EXAMPLES_INCLUDE_SERVER/INTERACTIVE/AUDIO/EXTERNAL (defaults: 0/1/0/0)
197+
APPLY_PATCH_AUTO_APPROVE, SHELL_AUTO_APPROVE, AUTO_APPROVE_MCP (default 1 in auto mode)
198+
EOF
199+
}
200+
201+
default_cmd="start"
202+
if [[ $# -eq 0 && -s "$RERUN_FILE" ]]; then
203+
default_cmd="rerun"
204+
fi
205+
206+
case "${1:-$default_cmd}" in
207+
start) shift || true; cmd_start "$@" ;;
208+
stop) shift || true; cmd_stop ;;
209+
status) shift || true; cmd_status ;;
210+
logs) shift || true; cmd_logs ;;
211+
tail) shift; cmd_tail "${1:-}" ;;
212+
collect) shift || true; collect_rerun "${1:-}" ;;
213+
rerun) shift || true; cmd_rerun "${1:-}" ;;
214+
*) usage; exit 1 ;;
215+
esac

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ htmlcov/
4545
.coverage
4646
.coverage.*
4747
.cache
48+
.tmp/
4849
nosetests.xml
4950
coverage.xml
5051
*.cover

examples/agent_patterns/agents_as_tools.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import asyncio
22

33
from agents import Agent, ItemHelpers, MessageOutputItem, Runner, trace
4+
from examples.auto_mode import input_with_fallback
45

56
"""
67
This example shows the agents-as-tools pattern. The frontline agent receives a user message and
@@ -56,7 +57,10 @@
5657

5758

5859
async def main():
59-
msg = input("Hi! What would you like translated, and to which languages? ")
60+
msg = input_with_fallback(
61+
"Hi! What would you like translated, and to which languages? ",
62+
"Translate 'Hello, world!' to French and Spanish.",
63+
)
6064

6165
# Run the entire orchestration in a single trace
6266
with trace("Orchestrator evaluator"):

examples/agent_patterns/agents_as_tools_conditional.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from pydantic import BaseModel
44

55
from agents import Agent, AgentBase, RunContextWrapper, Runner, trace
6+
from examples.auto_mode import input_with_fallback
67

78
"""
89
This example demonstrates the agents-as-tools pattern with conditional tool enabling.
@@ -81,7 +82,7 @@ async def main():
8182
print("2. French and Spanish (2 tools)")
8283
print("3. European languages (3 tools)")
8384

84-
choice = input("\nSelect option (1-3): ").strip()
85+
choice = input_with_fallback("\nSelect option (1-3): ", "2").strip()
8586
preference_map = {"1": "spanish_only", "2": "french_spanish", "3": "european"}
8687
language_preference = preference_map.get(choice, "spanish_only")
8788

@@ -95,7 +96,10 @@ async def main():
9596
print(f"The LLM will only see and can use these {len(available_tools)} tools\n")
9697

9798
# Get user request
98-
user_request = input("Ask a question and see responses in available languages:\n")
99+
user_request = input_with_fallback(
100+
"Ask a question and see responses in available languages:\n",
101+
"How do you say good morning?",
102+
)
99103

100104
# Run with LLM interaction
101105
print("\nProcessing request...")

examples/agent_patterns/deterministic.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from pydantic import BaseModel
44

55
from agents import Agent, Runner, trace
6+
from examples.auto_mode import input_with_fallback
67

78
"""
89
This example demonstrates a deterministic flow, where each step is performed by an agent.
@@ -39,7 +40,10 @@ class OutlineCheckerOutput(BaseModel):
3940

4041

4142
async def main():
42-
input_prompt = input("What kind of story do you want? ")
43+
input_prompt = input_with_fallback(
44+
"What kind of story do you want? ",
45+
"Write a short sci-fi story.",
46+
)
4347

4448
# Ensure the entire workflow is a single trace
4549
with trace("Deterministic story flow"):

0 commit comments

Comments
 (0)