Skip to content

HDDS-14774. Fix intermittent timeout in TestContainerReportHandling#10535

Merged
amaliujia merged 2 commits into
apache:masterfrom
chihsuan:HDDS-14774
Jun 19, 2026
Merged

HDDS-14774. Fix intermittent timeout in TestContainerReportHandling#10535
amaliujia merged 2 commits into
apache:masterfrom
chihsuan:HDDS-14774

Conversation

@chihsuan

@chihsuan chihsuan commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

TestContainerReportHandling#testDeletingOrDeletedContainerWhenNonEmptyReplicaIsReported intermittently times out on CI while waiting for replicas to be deleted.

In this test, SCM only deletes a DELETING/DELETED container's replicas when it processes a container report. The test triggers that report once, by restarting the datanodes. Because hdds.container.report.interval is left at its 60-minute default, if that single report is disrupted (e.g. command expiry, missed heartbeat, or a datanode briefly marked stale/dead under CI load), the next report is 60 minutes away, far beyond the 180s wait, so the test times out.

The fix sets hdds.container.report.interval to 1s, matching sibling tests (TestECContainerRecovery, TestBlockDeletion). A periodic report then re-triggers the deletion within seconds, so a single disrupted report can recover well within the wait.

TestContainerReportHandlingWithHA shares the identical pattern and root cause, so it is fixed the same way in this PR.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14774

How was this patch tested?

  • Reproduced the timeout deterministically: with the datanode restart removed (simulating a disrupted report) and the default report interval, all parameters fail with the same TimeoutException; with the 1s interval added, they pass.
  • Ran both TestContainerReportHandling and TestContainerReportHandlingWithHA locally; all parameters pass.
  • checkstyle passes.

Generated-by: Claude Code (Opus 4.8)

@chihsuan chihsuan marked this pull request as ready for review June 17, 2026 15:30
@amaliujia

Copy link
Copy Markdown
Contributor

org.apache.hadoop.ozone.recon.TestStorageDistributionEndpointEC looks like flaky

@amaliujia amaliujia merged commit c479e1e into apache:master Jun 19, 2026
72 of 76 checks passed
@adoroszlai

Copy link
Copy Markdown
Contributor

org.apache.hadoop.ozone.recon.TestStorageDistributionEndpointEC looks like flaky

HDDS-15519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants