Harden tray<->gateway keep-alive and reconnect lifecycle#627
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 18, 2026, 8:19 PM ET / 00:19 UTC. Summary Reproducibility: Source-reproducible, but not live-reproduced in this review. The linked issue provides a real reconnect/re-approval scenario, and the heartbeat finding follows from the PR source path where missing req/ping responses abort the socket. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Refresh on current main, preserve the merged connection/pairing fixes, make the heartbeat compatible with unsupported gateways or a documented health RPC, add focused coverage, and attach redacted real-gateway proof. Do we have a high-confidence way to reproduce the issue? Source-reproducible, but not live-reproduced in this review. The linked issue provides a real reconnect/re-approval scenario, and the heartbeat finding follows from the PR source path where missing req/ping responses abort the socket. Is this the best way to solve the issue? No, not yet. The lifecycle ownership direction is plausible, but the default application heartbeat needs a compatibility-safe contract and the branch needs current-main refresh plus real behavior proof. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against c0514bd2d026. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
WebSocketClientBase: - Add _disposing flag distinct from _disposed; set before OnDisposing, promote to _disposed after, so teardown callbacks see a stable state. - Gate ConnectAsync, ReconnectWithBackoffAsync, listener-finally (CAS/event/reconnect-kickoff), and reconnect kickoff sites on (!_disposed && !_disposing) to prevent post-Dispose work. - Re-check disposal after OnConnectedAsync and before spawning the listener Task so a Dispose racing the connect path does not leak a background listener. - CAS-clear + Abort() + Dispose() any ClientWebSocket installed into _webSocket if disposal wins the race during ConnectAsync; mirror in the orphan-clean else-if branch. - Abort() before Dispose() on owned sockets so peers see a clean RST instead of an unsent CLOSE. - Volatile reads on shutdown flags in HeartbeatLoopAsync to avoid cache staleness across cores. - Catch ObjectDisposedException narrowly in SendRawAsync and CloseWebSocketAsync so torn-down sockets do not surface as errors. - Guard OnDisposing with try/catch so a throwing subclass cannot skip later cleanup steps. - Per-event try/catch wrappers around status/error event raises so a throwing subscriber cannot block teardown. OpenClawGatewayClient: - Apply matching reconnect/teardown hygiene around the keep-alive and heartbeat paths so connection state stays consistent across forced disconnects. Tests: - Relax reconnect-backoff log assertion to tolerate jitter in the delay value (still asserts attempt number). Validation: - ./build.ps1 clean (0/0) - Shared.Tests: 2045 passed / 29 skipped - Tray.Tests: 877 passed - Manual: tray launched from net10.0-windows10.0.22621.0 build Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b42fb20 to
1f48af3
Compare
No code changes; previous push 1f48af3 addressed the Dispose() race finding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WebSocketClientBase:
OpenClawGatewayClient:
Tests:
Validation: