Skip to content

Pane(fix[wait_for_text]): waits for new output, joins wrapped lines, prefers wait-for-channel for completion#47

Merged
tony merged 31 commits into
mainfrom
stale-wait-for-text
May 16, 2026
Merged

Pane(fix[wait_for_text]): waits for new output, joins wrapped lines, prefers wait-for-channel for completion#47
tony merged 31 commits into
mainfrom
stale-wait-for-text

Conversation

@tony
Copy link
Copy Markdown
Member

@tony tony commented May 12, 2026

Closes #45.

What changed for users

wait_for_text becomes a "wait for output written after this call" primitive instead of "match anything currently in the pane." For agents whose first poll used to return found=True on stale scrollback, this is a real fix. The CHANGES ## libtmux-mcp 0.1.x (unreleased) section describes the published-release-reader contract.

Concrete user-visible changes vs 0.1.0a6:

  • Anchors on (history_size, cursor_y) at entry; only matches lines written after the call begins.
  • content_start / content_end parameters removed (the anchor supersedes them).
  • Joins visually wrapped lines for matching via tmux's -J flag — long error strings that wrap across rows now match.
  • Rejects empty pattern, sub-10 ms interval, non-positive timeout at entry.
  • Raises ToolError on pane death, pane respawn, or clear-history mid-wait.
  • WaitForTextResult.timed_out and ContentChangeResult.timed_out removed (mechanically not found / not changed).
  • Emits a notifications/message warning when polling approaches history-limit — the band where tmux's grid_collect_history trim+scroll bounce keeps hsize clamped at the cap and the strict-shrink guard can't catch every trim.
  • Server instructions, send_keys docstring, and wait_for_text docstring all point agents at wait_for_channel (composed with tmux wait-for -S) as the deterministic primitive for command completion. The polling-scraper wait_for_text is now framed for output the agent does not author.
  • docs/topics/prompting.md system-prompt fragment and docs/topics/troubleshooting.md "Timing" bullet brought in line with the channel-first guidance.

Theoretical grounding

Reviewed against tmux source pinned at 134ba6c:

  • Capture math: cmd-capture-pane.c:158 (top = gd->hsize + n)
  • Bottom-row clip: cmd-capture-pane.c:159-160
  • Trim mechanics: grid.c:386 (grid_collect_history decrements hsize by hlimit/10)
  • Scroll-after-trim sequence: grid-view.c:106 (grid_view_scroll_region_up)
  • Resize-grow / hsize pull-back: screen.c:451-465 (screen_resize_y)
  • Pane respawn preserves hsize: screen.c:127 (screen_reinit)
  • No native trim-detection primitive exists: format.c history format strings + options-table.c hooks list both checked, no monotonic counter or history-trimmed hook

The rollover detection's partial coverage is acknowledged explicitly. A 2 ms high-frequency probe across ~3000 samples confirmed there is no sub-poll window where hsize dips below entry once entry was sub-cap — so the strict-shrink guard catches clear-history and the entry-at-cap edge but not bursty trim. The runtime ctx.warning in the trim-risk band is the agent-facing mitigation; wait_for_channel is the deterministic alternative.

Test plan

rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run mypy; uv run py.test --reruns 0 -vvv; just build-docs

Green at every commit. Full test suite passes.

New regression tests cover: stale scrollback no-match (#45), bottom-row clip, mid-wait respawn, pane death under remain-on-exit, input rejection (empty pattern / tiny interval / non-positive timeout), wrap-spanning patterns, resize-grow not raising, history shrinking via clear-history raising, idle wait already in trim-risk band warning, burst wait advancing into trim-risk band warning, cancellation propagation, server instruction substring assertions, send_keys docstring cross-link.

Filed follow-ups (deferred, not in this PR)

Breaking changes

  • content_start / content_end parameters removed from wait_for_text. Pre-alpha API; CHANGES carries a before/after block.
  • wait_for_text raises ToolError on empty pattern, interval < 0.01, timeout <= 0, pane death, pane respawn, and history_size shrink below entry baseline. Previously most of these silently miscaptured or returned found=False after the full timeout.
  • WaitForTextResult.timed_out and ContentChangeResult.timed_out removed. They were mechanically not found / not changed; callers should switch to the primary result field. CHANGES carries a before/after block.

@tony tony force-pushed the stale-wait-for-text branch from aa8de89 to 317fbee Compare May 12, 2026 22:58
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.18%. Comparing base (30da675) to head (f79625c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   84.87%   85.18%   +0.31%     
==========================================
  Files          40       40              
  Lines        2268     2316      +48     
  Branches      286      295       +9     
==========================================
+ Hits         1925     1973      +48     
  Misses        260      260              
  Partials       83       83              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony
Copy link
Copy Markdown
Member Author

tony commented May 13, 2026

Code review

No issues found. Checked for bugs and AGENTS.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony force-pushed the stale-wait-for-text branch 3 times, most recently from 5c72dd9 to 9cafb35 Compare May 15, 2026 01:04
@tony
Copy link
Copy Markdown
Member Author

tony commented May 15, 2026

Code review

Found 1 issue:

  1. CHANGES unreleased block orders the subheadings ### Breaking changes### Fixes### Dependencies, but AGENTS.md mandates ### Breaking changes### Dependencies### What's new### Fixes### Documentation### Development. Move the Dependencies block to land between Breaking changes and Fixes.

libtmux-mcp/CHANGES

Lines 7 to 60 in fbe4698

<!-- END PLACEHOLDER - ADD NEW CHANGELOG ENTRIES BELOW THIS LINE -->
### Breaking changes
**{tooliconl}`wait-for-text` drops `content_start` and `content_end`**
The baseline anchor introduced in this release supersedes the manual capture-range knobs: capture now follows the pane's grid position automatically, so the two parameters had no remaining purpose. Agents that named them should drop them from their call sites. (#45)
```python
# Before
wait_for_text(pattern="OK", content_start=-100)
# After
wait_for_text(pattern="OK")
```
### Fixes
**{tooliconl}`wait-for-text` now waits for *new* output, not stale scrollback**
Previously {tooliconl}`wait-for-text` returned `found=True` on the first poll if the pattern was already present anywhere in the pane when the call began. It now snapshots `#{history_size} + #{cursor_y}` at entry and only matches lines written below that baseline, so agents using the tool to synchronise on command output see the result they expected. For the synchronous "is the pattern in the pane right now?" use case, call {tooliconl}`search-panes` instead. (#45)
**{tooliconl}`wait-for-text` guards against tmux's bottom-row capture clip**
`capture-pane -S` clips a below-visible start back to the bottom visible row, so a cursor sitting at the last visible row (the normal state after a fresh shell prompt) would match whatever stale text already occupied that row. The poll loop now reads `#{pane_height}` at entry and short-circuits the capture when the computed start would land below the pane. (#45)
**{tooliconl}`wait-for-text` surfaces pane death and respawn**
Each poll tick now reads `#{pane_dead}` and `#{pane_pid}` alongside the grid state. A pane whose process exited under `remain-on-exit`, or one that was swapped out by `respawn-pane` mid-wait, raises a `ToolError` instead of silently miscapturing against a frozen or reset grid. (#45)
**{tooliconl}`wait-for-text` rejects footgun inputs**
Empty `pattern`, sub-10ms `interval`, and non-positive `timeout` each raise a `ToolError` at entry. The previous behaviour would have matched every line (empty pattern), spun the tmux server (zero interval), or completed a surprise single probe instead of waiting (`timeout=0`). (#45)
**{tooliconl}`wait-for-text` honours the timeout budget end-to-end**
`start_time` is now captured before the initial baseline read so a stalled tmux server cannot consume the budget before the deadline timer even starts. libtmux's `tmux_cmd` uses `Popen.communicate()` with no subprocess timeout, so the worst case was real, not theoretical. (#45)
### Dependencies
**Minimum `libtmux>=0.56.0`** (was `>=0.55.1`). Unlocks the new tmux-command wrappers shipped in libtmux 0.56.0 — {meth}`~libtmux.Pane.respawn`, {meth}`~libtmux.Pane.copy_mode`, {meth}`~libtmux.Pane.pipe`, {meth}`~libtmux.Pane.swap`, {meth}`~libtmux.Pane.paste_buffer`, {meth}`~libtmux.Pane.clear_history`, {meth}`~libtmux.Pane.display_message`, {meth}`~libtmux.Server.delete_buffer`, and the {meth}`~libtmux.Session.next_window` / {meth}`~libtmux.Session.previous_window` / {meth}`~libtmux.Session.last_window` trio — so the MCP no longer falls back to raw `cmd()` calls for tmux commands the upstream API now covers. (#46)
## libtmux-mcp 0.1.0a6 (2026-05-09)
libtmux-mcp 0.1.0a6 is the activation and registration cleanup release. It makes the server much easier for MCP clients to discover from ordinary "pane", "window", and "session" prompts, standardizes new setup docs around the `tmux` registration slug, and adds migration guidance for existing `libtmux` registrations. Existing installs keep working; the release changes defaults and documentation so new installs line up with the tool prefix users actually see.
### What's new
**Bare tmux prompts now find the server**
Agents no longer need the word "tmux" in every prompt before this server becomes relevant. The server instructions now name the positive cases users naturally write, such as "split this pane", "current window", and "this session", while steering clients away from unrelated browser tabs, editor splits, tiling-window-manager panes, and notebook cells. The discovery anchors most likely to answer those prompts - {tooliconl}`list-panes`, {tooliconl}`list-windows`, and {tooliconl}`snapshot-pane` - also carry the preload hint used by clients that keep a small always-available schema set. See {ref}`prompting` for the user-facing activation contract. (#37)
**New installs are documented as `tmux`**

(AGENTS.md rule:

libtmux-mcp/AGENTS.md

Lines 419 to 421 in fbe4698

**Fixed subheadings**, in this order when present: `### Breaking changes`, `### Dependencies`, `### What's new`, `### Fixes`, `### Documentation`, `### Development`. Dev tooling (helper scripts, internal automation) lives under `### Development`. For breaking changes, show the migration path with concrete inline code (e.g. a `# Before` / `# After` fenced code block). Dependency floor bumps use the form ``Minimum `pkg>=X.Y.Z` (was `>=X.Y.W`)``.
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony force-pushed the stale-wait-for-text branch 3 times, most recently from afea390 to 273d08f Compare May 16, 2026 09:40
tony added 19 commits May 16, 2026 07:43
…onger matches (#45)

why: wait_for_text returned found=True on the first poll whenever the
     pattern was already present in the pane when the call began, so
     agents using it to synchronise on command output saw the wrong
     result. The sibling wait_for_content_change already snapshots a
     baseline; wait_for_text now mirrors that pattern.

what:
- Snapshot (history_size, cursor_y) at entry; compute baseline_abs
  as the absolute grid index. Each poll re-reads history_size and
  captures from baseline_abs - hs_now + 1 onward, so the matched
  region tracks the pane's grid even as content scrolls into history.
- Add _read_grid_position and _read_history_size helpers that expand
  the corresponding tmux format strings via display-message.
- Drop content_start / content_end parameters; the baseline anchor
  supersedes them. Pre-alpha API contract makes this a clean drop.
- Rewrite the docstring around the new "wait for new appearance"
  semantics, cross-reference search_panes for synchronous matches,
  and document edge cases (scrollback truncation, reverse-index
  sequences, clear/reset).
…#45)

why: The baseline-anchor fix changes the user-visible contract for
     wait_for_text: a pattern already in the pane before the call
     must now return found=False, not the stale match the old code
     produced. The test suite needs to encode that contract so a
     future regression lands red.

what:
- Restructure WAIT_FOR_TEXT_FIXTURES with separate pre_command and
  during_command fields so fixtures can model "pattern was already
  on screen when wait_for_text started."
- Add stale_scrollback_does_not_match: emit the pattern before
  wait_for_text starts and assert found=False on timeout.
- Add matches_new_output_after_baseline: start the wait task, sleep,
  emit the marker via send_keys, assert found=True. Covers the
  positive path the baseline anchor was designed for.
why: The libtmux 0.56 bump merged on trunk ships Pane.display_message
     as a typed wrapper around display-message, and every other tool
     module in this package adopted it. The two helpers introduced for
     baseline anchoring (_read_grid_position, _read_history_size)
     reached back to raw pane.cmd("display-message", ...) instead.
     Bringing them in line keeps the codebase consistent with the
     same release-notes entry that markets the wrapper migration.

what:
- _read_grid_position and _read_history_size call
  pane.display_message(fmt, get_text=True) directly; the parse logic
  and empty-output fallback are unchanged.
- Rename the local "result" binding to "stdout" to match the call
  surface used in meta.py.
why: capture-pane -S clips a below-visible start back to the bottom
     visible row (cmd-capture-pane.c, in every tmux release we
     support). When the cursor sits at the last visible row at entry,
     start_line = cy0 + 1 lands below the pane and tmux returns the
     bottom row's content. Stale text on that row — a fresh shell
     prompt, a printed marker — matches the pattern instantly, before
     any new output has appeared. The baseline-anchor fix that
     motivated this branch did not close this corner: the math is
     correct, but it assumed an empty capture result when start_line
     pointed below the visible region.

what:
- Add _read_pane_height alongside _read_history_size. Read pane_height
  once at entry; the bottom-row clip fires on entry-state, and a
  per-tick read costs another tmux round-trip without changing
  anything for the common case.
- Before each capture, short-circuit lines to [] when
  start_line >= pane_height. Route through the existing deadline /
  sleep tail so the loop honours the timeout instead of tight-
  spinning.
- tests/test_pane_tools.py: regression for #45. Respawn the pane with
  a sh -c that pre-fills, prints the marker without a trailing
  newline, and sleeps — freezing hsize and cursor_y for the wait
  window so the guard's first-tick contract is what's actually
  tested.
…dget

why: start_time was captured after the initial _read_grid_position
     and _read_pane_height calls, so a stalled tmux server could
     block the tool for an arbitrary period before the user-supplied
     timeout deadline even started. libtmux's tmux_cmd uses
     Popen.communicate() without a subprocess timeout, so the worst
     case is real, not theoretical: a server-side stall on the first
     display-message leaks past the wall-clock budget the caller
     asked for.

what:
- Move start_time and deadline to the start of the resolved-pane
  block, before the baseline and pane-height reads. Same total work,
  no new branches; only the budget accounting changes.
why: Three parameter values turn the wait loop into a footgun rather
     than a wait primitive:

     - ``pattern=""`` — ``re.compile('')`` matches the zero-width
       position of every line, so the first poll returns
       ``found=True`` against whatever was already in the pane.
     - ``interval=0`` — ``asyncio.sleep(0)`` yields the event loop
       without idling, so the loop fires tmux subprocesses as fast
       as the scheduler hands them out (a self-inflicted server-
       side DoS).
     - ``timeout<=0`` — the loop body runs one probe before the
       deadline check, so a non-positive timeout silently turns
       "wait" into "synchronous probe" without saying so.

     Each is rejected at function entry so the failure surfaces as
     an explicit ``ToolError`` instead of corrupting the contract.

what:
- Inline parameter checks before the regex compile in wait_for_text;
  raise ToolError with the offending value in the message.
- tests/test_pane_tools.py: one test per rejection, asserting the
  matching ToolError message.
why: The baseline-anchor mechanism assumes the pane's process and
     grid stay coherent for the duration of the wait. Two events
     break that assumption silently:

     - respawn-pane mid-wait. screen_reinit (screen.c) resets cy to 0
       but preserves hsize, so the absolute anchor keeps pointing at
       the old process's grid — the new process's output ends up at
       rows the capture skips, and the wait times out as if nothing
       arrived.
     - pane death. With remain-on-exit set, tmux keeps the pane alive
       after its child exits and #{pane_dead} flips to 1. Without
       detection, the loop just keeps polling a stale frozen grid
       until the deadline.

     Both cases now surface as explicit ToolErrors so callers can
     react instead of timing out in the dark.

what:
- Replace _read_grid_position, _read_history_size, and
  _read_pane_height with a single _PaneState NamedTuple read via one
  display-message round-trip. Per-tick subprocess count drops from
  three reads (entry + per-tick history + per-tick we'd need height)
  to two (state + capture) without losing any fields.
- Capture pane_pid at entry; each tick checks pane_dead first and
  then verifies pane_pid is unchanged. Either condition raises
  ToolError naming the affected pane and the transition.
- tests/test_pane_tools.py: regression for respawn (asyncio.gather
  pattern: start wait_task, respawn after 0.1s, expect ToolError)
  and for pane death (set remain-on-exit, respawn into ``true``,
  poll #{pane_dead}, expect ToolError).
…me-row blind spot

why: Two parts of the wait_for_text docstring oversold the
     baseline-anchor model:

     - The reverse-index note claimed ``\eM`` "rewrites history
       below the baseline." It does not. screen_write_reverseindex
       (screen-write.c) scrolls the visible region within
       [rupper, rlower] or decrements cy. ``hsize`` is never
       touched. A maintainer reading the tmux source to verify a
       related claim would lose trust in the docstring.
     - The "What 'new' means" block sells the tool as observing
       NEW text without saying that in-place updates to the entry
       cursor row (carriage-return rewrites, progress spinners,
       single-line status updates) are excluded by design. Users
       wiring this against TUIs that update in place see a silent
       timeout.

what:
- Rewrite the reverse-index note around the real hazard — in-place
  repaints below the baseline — and cite screen-write.c for the
  visible-region scroll behaviour rather than asserting history is
  rewritten.
- Add a paragraph under "What 'new' means" that explicitly
  disclaims same-row rewrites and points readers at
  wait_for_content_change for the in-place case.
…md format

why: The prior CHANGES entries on this branch carried commit-message
     shape (why: / what: bodies, mechanism-heavy bullets) into a
     changelog that AGENTS.md specifies as user-vocabulary prose
     under Bold subheadings. Three CHANGES commits were dropped from
     the branch via rebase; this single commit replaces them with
     prose entries in the project's actual changelog format.

what:
- Breaking changes: parameter drop with before/after migration block.
- Fixes: one prose paragraph covering the baseline-anchor behaviour
  change (stale-scrollback fix, bottom-row clip handling,
  death/respawn surfacing) under a single user-vocabulary subheading.
- Fixes: one prose paragraph covering input validation.
- Section ordering: Breaking changes / Dependencies / Fixes, per
  AGENTS.md line 420.
…'t match the implementation

why: Two comments in tests/test_pane_tools.py — the WaitForTextFixture
     docstring and the docstring on
     test_wait_for_text_matches_new_output_after_baseline — describe
     the positive-path coordination as using asyncio.gather. The
     implementation actually uses asyncio.create_task + a sequenced
     await on the emit coroutine, because gather would schedule both
     coroutines concurrently and lose the start-baseline-then-emit
     ordering the test relies on. A future maintainer following the
     comments would either rewrite the test incorrectly or doubt the
     existing shape.

what:
- WaitForTextFixture pre_command docstring: replace the
  asyncio.gather mention with asyncio.create_task plus a sequenced
  await.
- test_wait_for_text_matches_new_output_after_baseline docstring:
  describe the actual create_task + sequenced-await mechanism and
  note explicitly that asyncio.gather is unsuitable here because it
  loses the ordering guarantee.
…t precedent

why: The _stale_settled helper inside the wait_for_text parametrized
     test waited 2 seconds for (history_size, cursor_y) to stabilise
     across two consecutive polls. Every other settle-loop in this
     file (_pane_dead, _ready, _bottom_row_ready) uses a 5-second
     budget for similar tmux-state stability under shell activity.
     The 2-second outlier risked flaking on CI nodes whose zsh
     async-prompt hooks keep firing for longer than the budget,
     missing the two-consecutive-identical-reads window that
     signals settlement.

what:
- retry_until(_stale_settled, 2, raises=True) becomes
  retry_until(_stale_settled, 5, raises=True). Happy-path wall-clock
  is unaffected (the predicate returns True as soon as state
  stabilises, usually within milliseconds); only the failure-mode
  budget widens.
…nchor stays sound

Adds a rollover guard inside the poll loop: when ``state.history_size <
entry.history_size`` (grid_collect_history fired, clear-history ran, or the
session's history-limit option was shrunk mid-wait), the absolute baseline
``hs0 + cy0`` is no longer recoverable. There is no server-side way to
disambiguate "trimmed" from "still anchored", so surface the lost anchor
as ToolError with a steer toward wait_for_channel for deterministic
synchronization.

Also fixes the stale ``pane_height`` reference in the bottom-row clip
guard — it was captured once at entry and never refreshed, so a pane
resize mid-wait left the guard keyed to a stale height. Compare against
``state.pane_height`` re-read each tick so the guard tracks the live
visible region.

Refs tmux grid.c grid_collect_history; verified against tmux master HEAD.
Passes ``join_wrapped=True`` (tmux ``-J``) to ``Pane.capture_pane`` so a
pattern that crosses the pane's visual wrap is matched against the joined
logical line. Without this, long error strings like
"Build failed: module not found" got split across two rows by tmux and a
naive ``re.search`` against each row in isolation never matched.

The returned ``matched_lines`` entry for a wrap-crossing hit is the joined
line and can therefore be longer than ``pane_width``; documented in the
Notes section so agents that bound output understand the implication.
…nistic primitive

Re-frames the wait family so agents pick the cheaper, race-free option
first. Three coordinated doc edits:

* ``send_keys`` docstring now forks the post-send choice into deterministic
  (``wait_for_channel`` + ``tmux wait-for -S``), pattern-match
  (``wait_for_text`` for output the agent doesn't control), and any-change
  (``wait_for_content_change``).
* Server ``_INSTR_WAIT_NOT_POLL`` system instruction names
  ``wait_for_channel`` first; the 2 KB instruction budget is preserved by
  tightening the supporting prose.
* ``wait_for_text`` docstring gains a ``When NOT to use this`` section
  that names the sequential-send_keys race and steers callers to the
  channel pattern with a pointer at the ``run_and_wait`` recipe.

Tests:
- ``test_base_instructions_prefer_wait_over_poll`` now requires
  ``wait_for_channel`` to appear in the system instructions and to be
  named before the fallbacks.
- ``test_send_keys_docstring_cross_links_wait_for_channel`` is a cheap
  regression against future docstring drift.
The Phase 1 rollover predicate (``state.history_size < entry.history_size``)
fires correctly on real row eviction — ``grid_collect_history`` trim,
``clear-history``, ``set-option history-limit <smaller>`` on tmux master —
but ALSO fires on resize-grow when ``hscrolled > 0``. tmux's
``screen_resize_y`` (screen.c:451-465) decrements ``gd->hsize`` on a
vertical grow by pulling rows from history back into the visible region.
The rows are not freed; only the history/visible-region partition shifts
and absolute indices stay valid. The Phase 1 predicate raised a spurious
``ToolError`` in that case.

Trim and resize-grow are distinguished by ``pane_height``: trim leaves it
unchanged, resize-grow increases it. Adding ``state.pane_height <=
entry.pane_height`` as a conjunct makes the predicate the actual signature
of row eviction, exempting resize-grow.

Also walks back two over-broad claims that shipped with Phase 1:
the "monotonically non-decreasing" framing in CHANGES (resize-grow proves
it is not monotone) and the parenthetical about "shrinking history-limit
mid-wait" — that retroactive-trim path requires tmux commit 195a9cf which
is not in any tagged release as of tmux 3.6a.

Adds test_wait_for_text_survives_resize_grow_with_scrolled_history.
… guard catches a subset

A 2 ms high-frequency repro over ~3000 hsize samples showed the rollover
guard's strict-inequality predicate (state.history_size < entry.history_size)
never observes a dip during real history-limit overflow: tmux's
grid_collect_history trims (~10% of hlimit) then immediately scrolls new
lines back, so sampled hsize stays clamped near the cap. The predicate
only fires when entry.history_size happened to land exactly at hlimit
(environment luck). All earlier claims that "wait_for_text raises on
history rollover" were therefore over-broad.

Walk back the contract:

* Docstring's "Scrollback rollover raises" Note now says "rollover
  detection is partial" and enumerates what is and isn't caught
  (clear-history yes; retroactive history-limit shrink on tmux >= 3.7
  yes; grid_collect_history trim during continuous output NO). Points
  to the upcoming risk-band warning and to wait_for_channel for
  deterministic synchronization.
* ToolError message changes from "history rolled over during wait" to
  "history shrank below entry baseline" -- describes what we actually
  detected (a shrink), not what we assumed (a rollover).
* CHANGES breaking-change entry: same walk-back; references the new
  ### Fixes warning entry and the tracking issue filed in libtmux.
* test_wait_for_text_raises_on_history_rollover renamed to
  test_wait_for_text_raises_when_history_is_cleared, since that is
  the path it actually exercises. Docstring trimmed to describe the
  test mechanism, with no dev-history narrative.
* Same dev-history narrative removed from
  test_wait_for_text_survives_resize_grow_with_scrolled_history
  (same anti-pattern, same project guidance about not referencing
  the task or fix history in code comments).

No behavior change beyond the message text and test rename. Phase 7
follow-up adds the risk-band warning predicate that catches what this
guard does not.
… band

The strict-shrink rollover guard misses grid_collect_history trim during
continuous output: tmux trims ~10% of history-limit then immediately
scrolls a line back, so sampled hsize stays clamped at the cap and the
predicate state.history_size < entry.history_size never fires. A 2 ms
high-frequency probe across ~3000 samples confirmed there is no sub-poll
window where hsize dips below entry once entry was sub-cap.

Emit a one-shot ctx.warning when sampled state enters the trim-risk
band (top 10% of history-limit) AND the pane has advanced since the
wait's baseline. MCP clients subscribed to notifications/message see
"correctness is best-effort here. For deterministic synchronization
use wait_for_channel." and can decide whether to keep waiting, retry,
or switch primitives. Warning is one-shot per call so we don't spam.

Adds history_limit: int to _PaneState; the field is read in the same
display-message round-trip as the other per-tick fields, so the loop
still costs two subprocesses per tick (state + capture).

No CHANGES update in this commit -- the consolidated wait-for-text
CHANGES entry will land in a later commit per the Narrative Bleed
doctrine (no published-release user saw "no warning" to be "fixed";
this warning is just part of the new contract).
… output

Adds the test the rollover claim should have had from the start. Earlier
test_wait_for_text_raises_when_history_is_cleared exercises clear-history
only -- not the grid_collect_history trim that fires during bursty
output. This new test covers the actual real-rollover path.

Shape: set history-limit=50 globally, split a fresh pane that inherits
the small limit, attach a stub Context that records ctx.warning calls,
start wait_for_text against a never-appearing pattern, burst 200 echo
lines through the pane. tmux's grid_collect_history fires repeatedly,
sampled history_size enters the risk band (>= 45), and the wait emits
the one-shot trim-risk-band warning.

The wait's found / timed_out result is intentionally not asserted --
the contract is best-effort once polling enters the risk band, so
pinning a specific result would be over-specifying what tmux's grid
model can't actually guarantee. The test pins the warning contract
(what the tool guarantees) only.
…t for a published-release reader

CHANGES walk-back and Phase 5 "spurious resize-grow" entries described
intra-branch transitions a 0.1.0a6 user never saw. Consolidate the
wait-for-text section into entries that describe only what the published
release reader gets:

* Breaking changes: one entry under "waits for new output, not stale
  scrollback" covers the #45 fix (real published bug), the new
  baseline-loss ToolErrors, the resize-grow exemption, the trim-risk-band
  warning, and the wait_for_channel cross-link. Reader needs zero
  branch context.

* Fixes: keep the wrapped-line match and footgun-rejection entries
  (both real published bugs). Drop the "no longer raises spuriously
  on resize-grow" entry (phantom fix: 0.1.0a6 had no rollover guard
  to raise, spuriously or otherwise). The duplicate "waits for new
  output" Fixes entry is also dropped -- the consolidated Breaking
  changes entry covers it.

Docstring Notes section in wait.py scrubbed of the tmux-commit-SHA
citation and the forward-looking libtmux tracking-issue cross-reference.
The tracking issue exists (filed at libtmux as part of this PR) but
that's branch-internal narrative; users running the shipped wait_for_text
don't need to know about a follow-up investigation issue to use the
tool correctly.
@tony tony force-pushed the stale-wait-for-text branch from 273d08f to 91ea744 Compare May 16, 2026 13:01
@tony tony changed the title Pane(fix[wait_for_text]): anchor on baseline so stale scrollback no longer matches Pane(fix[wait_for_text]): waits for new output, joins wrapped lines, prefers wait-for-channel for completion May 16, 2026
@tony
Copy link
Copy Markdown
Member Author

tony commented May 16, 2026

Code review

Found 2 issues:

  1. CHANGES unreleased section ordering still violates AGENTS.md (the May 15 review's finding remains unresolved). AGENTS.md says "Fixed subheadings, in this order when present: ### Breaking changes, ### Dependencies, ### What's new, ### Fixes, ### Documentation, ### Development." Current order is Breaking changesDocumentationDependenciesFixes. ### Documentation must come last (after ### Fixes), and ### Dependencies must come before ### Fixes.

libtmux-mcp/CHANGES

Lines 28 to 44 in 91ea744

Trim that fires during continuous output cannot be reliably detected from `history_size` alone: tmux's trim-then-scroll bounce keeps sampled `hsize` clamped near `history-limit`. When polling approaches that band, the tool emits a `notifications/message` warning so MCP clients can decide whether to keep waiting, retry, or switch to {tooliconl}`wait-for-channel`. For deterministic command-completion synchronization, compose `tmux wait-for -S <channel>` into the shell command and call {tooliconl}`wait-for-channel`. (#45)
### Documentation
**Wait family is re-framed around {tooliconl}`wait-for-channel` as the deterministic primitive**
The {tooliconl}`send-keys` docstring, the server system instructions, and the {tooliconl}`wait-for-text` docstring now point agents at {tooliconl}`wait-for-channel` with composed `tmux wait-for -S` for command completion, and reserve {tooliconl}`wait-for-text` / {tooliconl}`wait-for-content-change` for output the agent does not author. The `run_and_wait` recipe is the canonical status-preserving pattern. (#45 follow-up)
### Dependencies
**Minimum `libtmux>=0.56.0`** (was `>=0.55.1`). Unlocks the new tmux-command wrappers shipped in libtmux 0.56.0 — {meth}`~libtmux.Pane.respawn`, {meth}`~libtmux.Pane.copy_mode`, {meth}`~libtmux.Pane.pipe`, {meth}`~libtmux.Pane.swap`, {meth}`~libtmux.Pane.paste_buffer`, {meth}`~libtmux.Pane.clear_history`, {meth}`~libtmux.Pane.display_message`, {meth}`~libtmux.Server.delete_buffer`, and the {meth}`~libtmux.Session.next_window` / {meth}`~libtmux.Session.previous_window` / {meth}`~libtmux.Session.last_window` trio — so the MCP no longer falls back to raw `cmd()` calls for tmux commands the upstream API now covers. (#46)
### Fixes
**{tooliconl}`wait-for-text` matches patterns across visually-wrapped lines**

AGENTS.md rule:

libtmux-mcp/AGENTS.md

Lines 419 to 421 in 91ea744

**Fixed subheadings**, in this order when present: `### Breaking changes`, `### Dependencies`, `### What's new`, `### Fixes`, `### Documentation`, `### Development`. Dev tooling (helper scripts, internal automation) lives under `### Development`. For breaking changes, show the migration path with concrete inline code (e.g. a `# Before` / `# After` fenced code block). Dependency floor bumps use the form ``Minimum `pkg>=X.Y.Z` (was `>=X.Y.W`)``.

  1. tests/test_pane_tools.py:2074 uses bash/zsh brace expansion {1..100} while every other pre-fill in the same file uses POSIX-portable $(seq 1 N) (8 other call sites: lines 1332, 1519, 1563, 1601, 1662, 1716, 2018). When the new pane's shell is dash or any non-bash POSIX /bin/sh, the literal string {1..100} is iterated once and produces no scrollback — the _prefilled retry_until then exhausts its budget before wait_for_text ever runs. Verified across all tmux 3.2a–master checkouts: shell selection sits below tmux, this is purely a shell-portability issue. Use $(seq 1 100) to match the file's convention.

# history-limit is 50. Risk floor (top 10%) is 45.
# Print 100 lines to ensure hsize reaches the cap (50).
fresh_pane.send_keys("for i in {1..100}; do echo line$i; done", True)
def _prefilled() -> bool:

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 2 commits May 16, 2026 08:50
…y_limit once

The risk-band warning's `advanced` conjunction
(state.history_size > entry.history_size OR state.cursor_y != entry.cursor_y)
gated the warning on the wait having moved since entry. Three real cases
that warning was meant to cover slipped past it:

* Entry already in the risk band, idle wait -- no output happens, nothing
  advances, warning never fires even though the baseline is in dangerous
  territory from tick zero.
* Cursor pinned at the bottom row of the visible region -- subsequent
  scrolls advance history_size but leave cursor_y unchanged; if hsize
  also stayed at the cap across two polls, advanced returned False.
* Trim+scroll bounce keeping hsize stable at the cap -- entry at 49,
  every subsequent poll also sees 49; advanced returns False.

Drop the conjunction. The one-shot warned_risk_band flag is already the
right noise filter -- it limits warnings to one per call regardless of
how long the wait sits in the band. Firing on any tick where
state.history_size >= risk_floor catches all three cases.

Also cache baseline_hlimit at entry instead of re-reading
state.history_limit each tick. history-limit is fixed at pane creation
(retroactive change only lands in tmux 3.7+), so per-tick reads are
redundant. The warning message uses the cached value too.

Adds test_wait_for_text_warns_when_already_in_risk_band -- pre-fills a
fresh pane past the risk floor, then runs an idle wait with no further
output. Asserts the warning fires anyway. Pairs with the existing
test_wait_for_text_warns_in_history_limit_risk_band which covers the
advance-into-the-band path.
…ooting at wait_for_channel

Two doc surfaces still recommended the send_keys -> wait_for_text ->
capture_pane pattern that the wait-family re-framing walked back:

* docs/topics/prompting.md has a "For general tmux workflows" code
  block tagged ``:class: system-prompt`` -- specifically meant for
  agents to copy into their AGENTS.md / CLAUDE.md / .cursorrules.
  Higher leverage than tool descriptions because it shapes the
  agent's mental model before any tool call.

* docs/topics/troubleshooting.md item 3 ("Timing") had the same
  recommendation in a "what to check" bullet -- lower visibility
  but still misleading.

Both now lead with wait_for_channel + composed tmux wait-for -S as
the deterministic primitive for command completion, and demote
wait_for_text / wait_for_content_change to "output you don't author"
duty. Matches the send_keys docstring, server instructions, and the
wait_for_text "When NOT to use this" section.

No CHANGES update -- the underlying re-framing already shipped under
### Documentation. These are stragglers being brought in line with
the announced change.
@tony tony force-pushed the stale-wait-for-text branch from 91ea744 to 82cc9cd Compare May 16, 2026 13:52
tony added 8 commits May 16, 2026 09:00
…proportions and ordering

Three corrections against the Changelog Conventions in AGENTS.md:

* **Section order.** The fixed subheading order is Breaking changes ->
  Dependencies -> What's new -> Fixes -> Documentation -> Development.
  The unreleased block had Documentation in the wrong slot (between
  Breaking changes and Dependencies) and Fixes ahead of Documentation
  out of order. Reordered to match.

* **Proportional prose.** Dropped backend-mechanism phrasing
  ("trim-then-scroll bounce", "sampled hsize clamped near history-limit",
  "no longer falls back to raw cmd() calls") that violates the
  "internal jargon" anti-pattern. The big breaking-change entry is now
  three short paragraphs in user vocabulary (what changed, what
  raises, what the warning means). The Dependencies entry no longer
  walls-of-text 9 specific {meth} method names; it describes the
  capability ("typed wrappers for tmux commands the server invokes").

* **PR ref form.** Replaced the lone `(#45 follow-up)` qualifier with
  bare `(#45)` to match the project's pattern across 25+ other entries.

The Documentation entry also picks up the doc-fragment work that
Phase 12 + Phase 14a actually shipped (quickstart, gotchas, prompting,
troubleshooting, recipes, send-keys topics), not just the three
docstring surfaces named originally.

The reordered breaking change pair leads with the bigger semantic
change (waits for new output) and follows with the syntactic break
(content_start / content_end removal) so readers see the meatier
fix first.
…_wait recipe

The run_and_wait prompt template appended ``exit $__mcp_status`` to its
shell payload to preserve the command's exit status. In an interactive
shell that exits the shell itself; with a single-pane session, the
session and tmux server exit with it. Reproduced: channel signals,
then list-panes / capture-pane fail with "no server running."

The equivalent example in docs/tools/pane/wait-for-channel.md was
worse -- ``pytest; status=$?; tmux wait-for -S tests_done; exit $status``
uses ``status`` (read-only in zsh) AND ``exit $status`` (kills the
shell in any shell). Reproduced: ``zsh: read-only variable: status``
followed by a deadlocked wait-for.

Replace both with the simpler ``{command}; tmux wait-for -S {channel}``
form. Shell ``;`` semantics fire the wait-for whether the command
succeeds or fails, so the edge-triggered signal still never deadlocks
on a crashed command -- which was the safety property the original
``exit $status`` chain was supposed to provide. Drop the
"preserving the command's exit status" claim from the recipe
docstring and rendered prose; preserving the status in an interactive
shell requires out-of-band routing (a file, a tmux user variable, a
scratch pane) that doesn't belong in a teaching recipe.

The existing test pinned ``exit $__mcp_status`` as a required token
in the rendered template -- that assertion is inverted to lock the
new safe contract. Tests still pin the channel naming pattern and
the wait-for suffix.

Adds a Fixes CHANGES entry: the run_and_wait prompt shipped in
0.1.0a2, so this is a real user-facing bug for 0.1.0a6 users.
…at entry

The cursor-position anchor captures rows below the entry cursor each
tick, which means rows that already contained content at entry got
matched by the regex on the first poll. Reviewer reproduced with pane
state 0:0:10 and STALE_BELOW painted on row 1: the wait returned
found=True with matched_lines=['STALE_BELOW'] in 10 ms. The docstring
claim that wait_for_text matches "lines written after the call starts"
was broader than the implementation could honestly deliver.

Snapshot the rows below the entry cursor once at entry as a frozenset.
On every tick, drop captured lines whose content is already in that
set before running the regex. Result is an honest content-delta
contract: "row exists at start_line onward AND was not at that
content at entry time". Stale paint-style writes (TUI repaints,
paste-text, manual cursor positioning) get filtered; new output
appended below the entry cursor still matches; in-place updates of
rows that pre-existed at entry are also filtered (false-negative
edge case the docstring now calls out).

Trade-off: one extra capture-pane call at entry. The frozenset
lookup is O(1) per line. Per-tick cost is unchanged in subprocess
count.

Adds test_wait_for_text_ignores_stale_below_cursor as the regression
the reviewer's reproduction would have caught: respawn the pane with
``printf 'TOP\nSTALE_BELOW\n'; printf '\033[H'; sleep 60`` (the
ESC[H sequence moves the cursor back to (0,0) so STALE_BELOW sits
on row 1 below the cursor), then assert wait_for_text(STALE_BELOW)
returns found=False.

No CHANGES update: this tightens behavior introduced earlier in the
same PR (the baseline-anchor design). A 0.1.0a6 user never saw the
cursor anchor at all, so the filter is not a user-visible fix — it's
the docstring contract getting honest in the same release.
…remaining surfaces

Phase 12 + Phase 14a swept six doc surfaces; a follow-up review caught
three more:

* docs/tools/index.md "Running a command?" entry led with
  send-keys -> wait-for-text + capture-pane. Now leads with the
  channel-composed pattern for authored commands, with wait-for-text
  and wait-for-content-change demoted to "output you don't author."

* docs/recipes.md pytest workflow used wait-for-text(pattern="passed|failed|error").
  Pytest is the agent's authored command -- the deterministic
  composition is "pytest; tmux wait-for -S pytest_done" + wait-for-channel.
  Updated the Act paragraph to match.

* src/libtmux_mcp/prompts/recipes.py interrupt_gracefully kept wait_for_text
  on purpose -- C-c is a signal, not a command, so there is no shell
  statement to compose tmux wait-for -S into. Added a clarifying note
  in the rendered template so future readers don't apply the channel
  re-framing reflexively.

No CHANGES update: the wait-family re-framing already has its entry
under ### Documentation. This just completes the sweep.
… what early-anchor measures

The pre-baseline start_time anchor was annotated as preventing a
stalled tmux server from blowing the user-supplied timeout. That
overstated what the early anchor does. libtmux's tmux_cmd uses
Popen.communicate() with no subprocess timeout, so a hung tmux call
blocks indefinitely regardless of where start_time was anchored.

The early anchor's actual value: WaitForTextResult.elapsed_seconds
reflects total call duration including the baseline read, so a hung
tmux call's blowout is visible in the returned result. Measurement,
not prevention.

Comment-only change. No CHANGES, no behavior delta.
…ion than wait_for_text

This branch hardens wait_for_text against pane respawn, pane death, and
clear-history mid-wait, and re-frames the wait family as a fallback ladder
that ends at wait_for_content_change. But the sibling tool itself never
gained those guards -- so an agent that follows the recommended fallback
order lands on the tool with the weakest baseline-loss detection. Respawn,
death, and clear-history surface there as silent changed=True returns
instead of ToolError.

Add a single docstring paragraph naming the gap and pointing back at
wait_for_channel for correctness-sensitive flows. Full parity with
wait_for_text is filed as #53 -- this commit closes the disclosure gap
so agents see the asymmetry from the tool description directly, not
only from following the issue trail.

Verified via the local MCP probe: wait_for_content_change now exposes
the wait_for_text / ToolError / wait_for_channel tokens in its
description across all four configured clients.
…m tool docstrings

The recipe fix in #47 replaced run_and_wait's killing payload
(`cmd; status=$?; tmux wait-for -S name; exit $status`) with the safe
`cmd; tmux wait-for -S name` form, because `exit $status` in an
interactive shell exits the shell itself and destroys single-pane
sessions. The sweep missed the two surfaces where it matters most:

* wait_for_tools.py module docstring (Safety section) -- told every
  agent reading it to use the killing payload.
* wait_for_channel function docstring (composition example) -- the
  same payload appears as the canonical composition example, with a
  body paragraph explicitly recommending `status=$?; ...; exit
  $status` as "important."

Both become MCP tool descriptions at the protocol level, so agents
learn the destructive pattern directly from the channel tool that
this PR positions as the deterministic primitive. That is the
opposite of the framing the rest of the branch establishes.

Replace both example payloads with `pytest; tmux wait-for -S
tests_done` (the form recipes.py renders since the recipe fix), and
add a "Do NOT chain exit $status" warning explaining the failure
mode so an agent that has read older docs can correct course.
Exit-status preservation in interactive shells is named as
out-of-scope to match the recipe's stance.

Verified via the local MCP probe against the swapped `tmux` server
on claude / codex / cursor / gemini: wait_for_channel now teaches
the safe pattern and contains no unguarded `exit $status` token.
…s surfaces

The recipe fix in #47 removed exit-status preservation from the
run_and_wait recipe itself but the supporting prose that describes
what run_and_wait produces was not swept. Five surfaces still
advertised the recipe as status-preserving:

* docs/prompts.md sample render -- showed the OLD payload
  `pytest; __mcp_status=$?; tmux wait-for -S name; exit $__mcp_status`
  in the run_and_wait sample-render block, including the body
  paragraph that justified the exit-status idiom. Readers of the
  prompts catalogue page learned the killing pattern even though the
  rendered prompt itself no longer emits it.
* src/libtmux_mcp/tools/pane_tools/io.py send_keys docstring --
  referred to run_and_wait as "the canonical status-preserving
  pattern." The send_keys description is the entry point most agents
  read for picking a sync primitive; pointing them at a status-
  preserving recipe that no longer preserves status is a contract lie.
* src/libtmux_mcp/tools/pane_tools/wait.py wait_for_text docstring
  -- carried "safe, status-preserving composition" in the channel
  cross-link.
* CHANGES unreleased entry -- the Documentation bullet for the
  channel-first re-framing closed with "The run_and_wait recipe
  shows the canonical status-preserving pattern."
* docs/demo.md -- the worked examples and "fundamental pattern"
  prose still chained `send_keys -> wait_for_text -> capture_pane`,
  rather than the channel-first pattern this PR establishes.

Replace "status-preserving" with "safe-completion" on the
io.py / CHANGES surfaces; rewrite the prompts.md sample render to
match the actual run_and_wait output (now `cmd; tmux wait-for -S
channel`) with the recipe's explicit "does not preserve exit status"
disclaimer; switch demo.md's worked examples and "fundamental
pattern" sentence to the wait_for_channel composition.

Verified via the local MCP probe across claude / codex / cursor /
gemini: send_keys description now contains "safe-completion" and no
surface teaches the OLD payload string.
@tony tony force-pushed the stale-wait-for-text branch from 989322a to 0ded558 Compare May 16, 2026 17:07
tony added 2 commits May 16, 2026 12:36
WaitForTextResult.timed_out was mechanically `not found`;
ContentChangeResult.timed_out was mechanically `not changed`. There
is no third state — every wait outcome is either match-found or
timed-out, so the second boolean carries no information beyond the
primary result field. The redundancy was visible in the test suite
as `assert result.timed_out is (not expected_found)`, asserting the
identity directly.

Drop the field from both Pydantic models. Drop the assignments in
the two return sites in wait.py. Drop the redundant assertions in
test_pane_tools.py — each was paired with an equivalent assertion
on `found` or `changed`. Replace the now-unused-result lints in the
two timeout-warning tests with affirmative `assert result.found is
False` / `assert result.changed is False` so the timeout outcome
stays covered alongside the warning-emission contract. Update the
timed_out references in test docstrings and in the JSON examples on
the wait-for-text / wait-for-content-change docs pages. CHANGES
gains a Breaking changes bullet.

Pre-alpha, no compat shim. Agents that read `result.timed_out`
should switch to `not result.found` / `not result.changed`.

Verified via the local MCP probe across the swapped clients and the
full quality-gate chain: 462 tests pass, ruff and mypy clean,
build-docs succeeds, tool descriptions still teach the safe pattern.
…able

why: AGENTS.md "Changelog Conventions" requires each deliverable to
     have a bold subheading naming it in user vocabulary, with
     cross-references in the prose body. The previous heading led
     with two `{class}` rST roles that crowded the user-visible
     change ("drop `timed_out`") and pushed it to the end of a long
     line.

what:
- Rename the bold subheading to "Wait result models drop
  `timed_out`" — names the deliverable, fits on one line, reads as
  user vocabulary.
- Move the two `{class}` cross-references for WaitForTextResult and
  ContentChangeResult into the opening sentence of the body where
  they render inline.
- Body wording unchanged otherwise; the migration guidance and PR
  reference stay intact.
@tony tony merged commit b180d99 into main May 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wait_for_text matches stale scrollback immediately — needs baseline anchor

2 participants