Skip to content

[1839] Fix: Truncate status.deploy.stdout as it can exceed etcd's limit on large clusters, causing reconciliation delay#1840

Open
himsngh wants to merge 1 commit into
carvel-dev:developfrom
himsngh:himsngh/1839
Open

[1839] Fix: Truncate status.deploy.stdout as it can exceed etcd's limit on large clusters, causing reconciliation delay#1840
himsngh wants to merge 1 commit into
carvel-dev:developfrom
himsngh:himsngh/1839

Conversation

@himsngh

@himsngh himsngh commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

On large clusters (>~250 nodes), a kapp deploy can produce several MiB of change output (one line per resource per node). kapp-controller writes this verbatim into status.deploy.stdout and then calls UpdateStatus on the App CR. etcd rejects objects larger than its hard 2 MiB gRPC message limit, causing the status write to fail repeatedly. The deployment itself has already succeeded, only the status update is failing but the App stays in ReconcileFailed until a subsequent no-op reconcile produces a small enough diff to fit.

This PR fixes the issue by truncating individual status output fields to 1 MiB before they are written into the status struct. When truncation occurs the field is prefixed with [output truncated]\n so operators immediately know the field is incomplete rather than receiving silently clipped output. The tail of the output is kept (not the head) because the most actionable content, the final resource summary and any error lines, always appears at the end of kapp output.

Fields covered: status.deploy.stdout, status.deploy.stderr, status.fetch.stdout, status.fetch.stderr, status.inspect.stdout, status.inspect.stderr, status.usefulErrorMessage.

Which issue(s) this PR fixes:

Fixes #1839

Does this PR introduce a user-facing change?

NA

Additional Notes for your reviewer:

Review Checklist:
  • Follows the developer guidelines
  • Relevant tests are added or updated
  • [-] Relevant docs in this repo added or updated
  • [-] Relevant carvel.dev docs added or updated in a separate PR and there's
    a link to that PR
  • Code is at least as readable and maintainable as it was before this
    change

Additional documentation e.g., Proposal, usage docs, etc.:


@himsngh himsngh changed the title [1839] Fix: Truncate status.deploy.stdout as it can exceed etcd's 2 M… [1839] Fix: Truncate status.deploy.stdout as it can exceed etcd's limit on large clusters, causing reconciliation delay Jun 29, 2026
…iB limit on large clusters, causing reconciliation delay

Signed-off-by: Himanshu Singh <himansh.singh3@gmail.com>
@himsngh himsngh marked this pull request as ready for review June 29, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

status.deploy.stdout can exceed etcd's 2 MiB limit on large clusters, causing reconciliation delays

2 participants