Skip to content

fix(orch): prevent NBD dispatch read-loop stall on WRITE_ZEROES (behind flag)#3048

Open
kalyazin wants to merge 3 commits into
mainfrom
fix/nbd-write-zeroes-head-of-line-stall
Open

fix(orch): prevent NBD dispatch read-loop stall on WRITE_ZEROES (behind flag)#3048
kalyazin wants to merge 3 commits into
mainfrom
fix/nbd-write-zeroes-head-of-line-stall

Conversation

@kalyazin

@kalyazin kalyazin commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Why

The NBD dispatcher can stall its per-connection read loop.

Each connection is served by a single goroutine that reads requests off the socket (Dispatch.Handle). READ and WRITE replies are written from spawned goroutines via writeResponse, which holds a shared writeLock across the socket write (and the socket has no write deadline). WRITE_ZEROES and TRIM, however, are handled inline on the read loop and also go through writeResponse.

If a reply writer is blocked inside writeResponse while holding writeLock — e.g. a large reply whose socket write blocks because the peer is momentarily not draining the socket — an inline WRITE_ZEROES/TRIM on the read loop blocks acquiring writeLock. The read loop then stops reading the socket until the lock is released. Since that loop is the only thing draining the connection, no further requests are processed while it is blocked, and the kernel NBD client can time out the connection.

What

  • Add the nbd-async-write-zeroes feature flag (disabled by default).
  • When enabled, WRITE_ZEROES/TRIM are handled in a goroutine like cmdRead/cmdWrite, so the read loop never acquires writeLock and a blocked reply writer cannot stall it.
  • Make the WRITE_ZEROES backend call context-aware (run in a goroutine, select on ctx), matching cmdRead/cmdWrite, so it cannot hang the pending-response drain on shutdown.
  • No behavior change with the flag off (the default), allowing a gradual rollout.

Validation

  • Regression test TestDispatchWriteZeroesReadLoopStall drives the real dispatch loop with a reply writer blocked while holding writeLock, then dispatches a WRITE_ZEROES and probes with a fresh READ:

    • inline (flag off): the loop stalls and stops serving the probe
    • async (flag on): the loop keeps serving the probe
    • both recover once the blocked reply drains

    Passes under -race.

  • go test ./pkg/sandbox/nbd/... (incl. -race), featureflags tests, golangci-lint, go vet, gofmt all clean.

  • Making WRITE_ZEROES async is consistent with the existing async read/write design: NBD allows out-of-order replies, the guest orders dependent I/O via completions (the reply is sent only after WriteZeroesAt completes), and the overlay cache serializes concurrent writes via its own mutex.

Out of scope (follow-ups)

🤖 Generated with Claude Code

@cla-bot cla-bot Bot added the cla-signed label Jun 19, 2026
@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches the sandbox NBD I/O path where stalls become guest-visible errors, but default behavior is unchanged and the fix is feature-flag gated with a targeted regression test.

Overview
Fixes a case where WRITE_ZEROES/TRIM handled on the NBD dispatch read loop could block on the shared reply writeLock while another reply was stuck on a slow socket write, stopping request draining and leading to kernel NBD timeouts and guest I/O errors. When the new nbd-async-write-zeroes flag is on (default off), those commands run in a background goroutine like read/write so the read loop stays responsive; WriteZeroesAt is also run with context cancellation like other async handlers, and a regression test locks in inline-vs-async stall behavior for gradual rollout.

Reviewed by Cursor Bugbot for commit fce9507. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2963 2 2961 8
View the top 2 failed test(s) by shortest run time
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost
Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1
Stack Traces | 3.97s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
Executing command python in sandbox iurvkvwhsm3xzk0qzb2fb
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1116}}
    localhost_bind_test.go:71: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:71
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command python in sandbox iurvkvwhsm3xzk0qzb2fb: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestBindLocalhost/bind_::1
    localhost_bind_test.go:82: [Status code: 502] Response body: {"sandboxId":"iurvkvwhsm3xzk0qzb2fb","message":"The sandbox is running but port is not open","port":3210,"code":502}
    localhost_bind_test.go:82: [Status code: 502] Response body: {"sandboxId":"iurvkvwhsm3xzk0qzb2fb","message":"The sandbox is running but port is not open","port":3210,"code":502}
--- FAIL: TestBindLocalhost/bind_::1 (3.97s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

In packages/orchestrator/pkg/sandbox/nbd/dispatch.go, cmdWriteZeroes does not handle context cancellation when executing the backend operation. If the backend WriteZeroesAt call blocks indefinitely during shutdown, the goroutine will hang and prevent the orchestrator from draining pending responses cleanly, leading to a shutdown deadlock.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +451 to +465
performWriteZeroes := func() error {
var respErr uint32
if _, err := d.prov.WriteZeroesAt(int64(cmdFrom), cmdLength); err != nil {
respErr = 1
logger.L().Error(ctx, "nbd backend write-zeroes failed",
zap.Error(err),
zap.String("nbd_provider", d.provName),
zap.Uint64("nbd_handle", cmdHandle),
zap.Uint64("nbd_offset", cmdFrom),
zap.Int64("nbd_length", cmdLength),
)
}

return d.writeResponse(respErr, cmdHandle, nil)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Unlike cmdRead and cmdWrite, cmdWriteZeroes does not handle context cancellation when executing the backend operation. If the backend WriteZeroesAt call blocks indefinitely during shutdown, the goroutine will hang and prevent the orchestrator from draining pending responses cleanly, leading to a shutdown deadlock. Spawning a goroutine for the backend call and selecting on ctx.Done allows the dispatcher to unblock and exit gracefully.

	performWriteZeroes := func() error {
		errchan := make(chan error, 1)
		go func() {
			_, err := d.prov.WriteZeroesAt(int64(cmdFrom), cmdLength)
			errchan <- err
		}()

		var writeErr error
		select {
		case <-ctx.Done():
			writeErr = ctx.Err()
		case err := <-errchan:
			writeErr = err
		}

		var respErr uint32
		if writeErr != nil {
			respErr = 1
			logger.L().Error(ctx, "nbd backend write-zeroes failed",
				zap.Error(writeErr),
				zap.String("nbd_provider", d.provName),
				zap.Uint64("nbd_handle", cmdHandle),
				zap.Uint64("nbd_offset", cmdFrom),
				zap.Int64("nbd_length", cmdLength),
			)
		}

		return d.writeResponse(respErr, cmdHandle, nil)
	}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed. cmdWriteZeroes now runs WriteZeroesAt in a goroutine and selects on ctx, mirroring cmdRead/cmdWrite, so a blocked backend call can't hang the goroutine or the pending-response drain on shutdown. Folded into the refactor commit (performWriteZeroes).

@kalyazin kalyazin force-pushed the fix/nbd-write-zeroes-head-of-line-stall branch from 195a974 to e835f18 Compare June 19, 2026 16:44
kalyazin and others added 3 commits June 19, 2026 17:50
Add an asyncWriteZeroes field to Dispatch and a corresponding NewDispatch
parameter, and split cmdWriteZeroes into a shared performWriteZeroes
closure that can run either inline or in a goroutine.

All callers pass false, so behavior is unchanged: WRITE_ZEROES/TRIM still
run inline on the read loop. This prepares for gating the async path
behind a feature flag.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the nbd-async-write-zeroes feature flag (default false), evaluate it
per device mount, and pass it to NewDispatch. When enabled, WRITE_ZEROES
and TRIM are handled in a goroutine instead of inline on the dispatch read
loop.

Inline handling shares writeLock with the asynchronous read/write reply
writers. If a reply write blocks on a full socket send buffer while holding
writeLock, a WRITE_ZEROES processed on the read loop blocks acquiring the
lock; the loop then stops draining the socket and the kernel times out the
NBD connection, returning EIO to the guest and crashing Firecracker on the
next flush. Moving the work off the loop keeps the socket drained.

Disabled by default for a gradual rollout.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drive the real Dispatch loop over an in-memory socket whose reply Write
blocks, simulating a full kernel send buffer. With a READ reply blocked
while holding writeLock, dispatch a WRITE_ZEROES and probe with a fresh
READ:

  - inline (flag off): the loop stalls and stops serving the probe
  - async  (flag on):  the loop keeps serving the probe

Both modes recover once the blocked reply drains, matching the transient,
self-clearing incidents this fix targets.

Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kalyazin kalyazin force-pushed the fix/nbd-write-zeroes-head-of-line-stall branch from b0ac16d to fce9507 Compare June 19, 2026 16:51
@kalyazin kalyazin marked this pull request as ready for review June 19, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant