Skip to content

Fix pdb / breakpoint() hang in workflow code#1568

Open
elidlocke wants to merge 2 commits into
temporalio:mainfrom
elidlocke:pdb-hang-repro
Open

Fix pdb / breakpoint() hang in workflow code#1568
elidlocke wants to merge 2 commits into
temporalio:mainfrom
elidlocke:pdb-hang-repro

Conversation

@elidlocke

Copy link
Copy Markdown

What was changed

When debug_mode=True on the Worker (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt — including from a sandboxed workflow run under pytest. Four pieces:

  • Inline dispatch on the main thread. Activations run on the asyncio main thread (scheduled via loop.call_soon to avoid nesting inside the dispatch task's __step()), so pdb's input() reaches the TTY.
  • Targeted sandbox relaxation. breakpoint is removed from the sandbox's invalid builtins so the call can reach the worker hook. Nothing else is relaxed.
  • Custom Pdb subclass. Drops into pdb at the workflow's own frame (not our indirection), suspends sandbox checks for the duration of each REPL interaction, and overrides q / Ctrl-D to continue the workflow instead of failing it with BdbQuit.
  • Defensive sys.breakpointhook. Calling breakpoint() from a workflow worker thread without debug_mode raises a clear RuntimeError instead of silently hanging.

When debug_mode is not set, the worker's dispatch and sandbox config are unchanged. The defensive hook replaces a silent hang with a clear error — strictly an improvement, not a change to working code.

Why?

breakpoint() and pdb.set_trace() inside workflow code silently hang today. Three overlapping issues:

  1. Activations run on a ThreadPoolExecutor thread, so pdb's input() can't read the controlling TTY.
  2. The sandbox flags breakpoint as non-deterministic, so the call doesn't reach the debugger.
  3. pdb's cmdloop touches more sandbox-restricted internals at runtime (e.g. readline.get_completer) — relaxing the builtin alone isn't enough.

Direct synchronous activation from the dispatch coroutine doesn't work on Python 3.14:

RuntimeError: Cannot enter into task <workflow run task>
  while another task <_handle_activation> is being executed.

The dispatch task is mid-__step() when workflow.activate tries to step the workflow's own task; 3.14 refuses. await future after loop.call_soon suspends the dispatch task first.

Complements #1249 (sandbox passthrough for IDE debuggers). Independent change, different debugger.

Checklist

  1. Closes Setting debug_mode in a Worker still doesn't allow the user of breakpoints #1104

  2. How was this tested:

  • tests/worker/test_breakpoint_hang.py — five tests covering thread placement (both modes), breakpoint in a sandboxed workflow lands at the user's frame with locals visible, q/Ctrl-D continues cleanly, defensive hook raises. 5/5 pass on Python 3.13 and 3.14.
  • Manual: drop breakpoint() into any workflow's run() body, run via pytest -s (or a standalone python script), confirm the (Pdb) prompt opens at the user's frame with locals in scope.
  1. Any docs updates needed?
  • Yes. Adds a "Debugging Workflows with breakpoint() / pdb" subsection to the README under Workflow Sandbox, with a runnable example and the workflow-task-timeout caveat.

Comment thread temporalio/worker/_workflow.py Outdated
Comment thread temporalio/worker/_workflow.py Outdated
When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow
code now opens an interactive pdb prompt -- including from a sandboxed
workflow run under pytest. Four pieces:

- Inline dispatch on the asyncio main thread (via loop.call_soon to
  avoid nesting inside the dispatch task's __step() and tripping
  Python 3.14's task-entry validation).
- breakpoint removed from the sandbox's invalid builtins so the call
  reaches the worker hook. Nothing else is relaxed.
- A Pdb subclass that lands at the workflow's own frame, suspends
  sandbox checks during each REPL interaction, and overrides q/Ctrl-D
  to continue the workflow instead of failing it with BdbQuit.
- A defensive sys.breakpointhook that raises a clear RuntimeError when
  breakpoint() is called from a workflow worker thread without
  debug_mode, replacing the previous silent hang.

When debug_mode is not set, the worker's dispatch and sandbox config
are unchanged.

Adds a README subsection on debugging workflows and five tests at
tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14.

Closes temporalio#1104.
@elidlocke elidlocke requested a review from tconley1428 June 8, 2026 16:42
self._deadlock_timeout_seconds = None if debug_mode else 2
self._deadlock_timeout_seconds = None if self._debug_mode else 2

_install_workflow_breakpoint_hook()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably only happen during debug mode as well. It may not make any actual difference, but it would be good to give that assurance that nothing is changing outside of debug mode.

@elidlocke elidlocke Jun 8, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook is intentionally always installed. The only case it catches is breakpoint() called from workflow code without debug_mode set, which is #1104's original silent hang. Gating on debug_mode would remove the error in exactly the scenario I think should be converted from silent hang to loud error.

When debug_mode is on, the dispatch fix routes the workflow to MainThread, so the hook's temporal_workflow_* check never matches and it just delegates. No observable change.

Maybe I add a code comment making the always-on rationale explicit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting debug_mode in a Worker still doesn't allow the user of breakpoints

2 participants