CLI: Stats command - fix incorrect CPU % reporting by dkbennett · Pull Request #40627 · microsoft/WSL

dkbennett · 2026-05-22T18:00:44Z

Summary of the Pull Request

The CPU calculation is incorrect due to precpu_stats not being populated from the docker request. This is due to the combination of "stream=false" and "one-shot=true" in the request parameters. In this combination, docker engine does not populate precpu_stats. The fix is to remove "one-shot=true" from the request, which causes docker engine to do two samples internally and then have valid population for precpu_stats.

Since docker does an internal sampling of ~1s, and docker does not support batch container requests (one container id per request), running stats on all active containers now has a performance issue of 1s * N containers. This fix also runs the stats requests for all containers in parallel to mitigate this performance issue, and adds a generous E2E test to verify the parallel processing and performance is as-expected and does not regress.

This fix also corrects some minor differences in the stats calculation in the CLI end for edge cases we will probably never hit but aligning it to be exact for the sake of being as correct as possible with the Docker CLI implementation.

PR Checklist

Closes: Link to issue #xxx
Communication: I've discussed this with core contributors already. If work hasn't been agreed, this work might be rejected
Tests: Added/updated if needed and all pass
Localization: All end user facing strings can be localized
Dev docs: Added/updated if needed
Documentation updated: If checked, please file a pull request on our docs repo and link it here: #xxx

Detailed Description of the Pull Request / Additional comments

Three changes were made here:

The stats call in the docker request dropped "one-shot=true". This, when combined with "stream=false" caused docker to not populate precpu_stats. They were zeroed out (confirmed in debugger). Removing this line caused docker engine to do an internal 1s block to capture the prior sample. With the one-shot parameter removed, the precpu_stats were now populated (also confirmed in debugger).
The stats calculation had two adjustments to align it exactly with Docker's calculation:

The "onlineCpus" was changed to have a fallback to percpu_usage size, as docker does. This is because in older docker APIs the stats can report onlineCpus as 0. In this case we had different behavior so this was fixed. The debugger confirmed that this is not actually the case with the APIs we are using so this is only a defensive change in case the API behavior were to change in the future, we should not break.
The system delta & cpudelta check was using >= while docker uses >. This also should not have any meaningful effect, but for exact correctness with Docker's algorithm it is being fixed.

Stats are requested per-container and are now in parallel to account for the ~1 sampling time per container. With all requests being run simultaneously the actual runtime of the stats command should be within a few seconds, even with many containers. A test was added to verify this that is intentionally generous to account for a potentially stressed test machine.

The first change is the real fix to the CPU issue. The second is defensive alignment for edge cases that we should not in practice hit, but might someday encounter, and the third is a mitigation for the potential scaling issue the first change introduced.

Validation Steps Performed

New parallel stats command with 20 'sleep infinity' containers to verify the sampling time for a large number of containers is relatively close to ~1s. This timing is very generous given the test machine may be stressed. Currently it allows for up to 13s for it to run and be considered still running in parallel. If this test starts having random failures we can revisit the parameters or consider removing the test.

1 Ran stress tests with 1 cpu and 16 cpus:

wslc run --rm alpine sh -c "apk add -q stress-ng && stress-ng --cpu 16 --timeout 60"
wslc run --rm alpine sh -c "apk add -q stress-ng && stress-ng --cpu 1 --timeout 60"
2 Ran stats in another window to confirm values

Confirmed CPU% was around 100% for the --cpu 1 case and 1600% for the --cpu 16 case. This is correct with Docker's CPU% reporting as 100% per core. 1 fully loaded cpu should be 100%, and 16 should be around 1600%. Excess over these expectations is also expected and includes the process's own overhead and init process consuming additional CPU.

Also here is a manual run of 20 containers and the stats command clearly running far less than 20s the sequential command would have.

Copilot

Pull request overview

This PR fixes incorrect CPU% reporting in the wslc ... stats command by ensuring Docker’s precpu_stats is populated and by aligning the CLI-side CPU% calculation with Docker CLI’s algorithm.

Changes:

Removed the one-shot=true stats query parameter so Docker returns precpu_stats alongside cpu_stats when stream=false.
Updated CPU% calculation to (a) fall back to percpu_usage length when online_cpus == 0 and (b) match Docker’s > 0 delta checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/windows/wslcsession/DockerHTTPClient.cpp	Adjusts Docker stats request parameters to ensure `precpu_stats` is populated for CPU% deltas.
src/windows/wslc/tasks/ContainerTasks.cpp	Aligns CPU% computation with Docker CLI, including `online_cpus` fallback behavior and delta checks.

OneBlue · 2026-05-27T18:05:44Z

-        const auto onlineCpus = stats.cpu_stats.online_cpus > 0 ? stats.cpu_stats.online_cpus : 1u;
-        if (systemDelta > 0.0 && cpuDelta >= 0.0)
+
+        // When online_cpus is 0 (older API responses), fall back to percpu_usage array length — matches Docker CLI behavior.


Is that something that can happen with our version of docker ? If not, I'd just recommend removing that block entirely since the API response we get should be consistent

OneBlue · 2026-05-27T18:05:58Z

+
+        VERIFY_IS_GREATER_THAN_OR_EQUAL(
+            elapsedMs, c_minExpectedMs, L"Stats completed suspiciously fast - Docker sampling may not be occurring");
+        VERIFY_IS_LESS_THAN(


Sadly this is one of the cases that's difficult to test in a reliable way. I don't recommend running this timing check in the CI, since it's likely to break if Defender / Windows Update triggers right as this test runs.

OK I will remove this test. I have refactored some things with the latest update, and included a test for the batching async code independent of the stats calling it, so that should provide good coverage that the batching / async behavior is working as expected and we can avoid a potentially risky test like this one.

OneBlue · 2026-05-27T18:07:58Z


-    // Build stats as a json array first for later filtering or display either as json or table format.
-    nlohmann::json statsJson = nlohmann::json::array();
+    // Fetch stats for all containers concurrently. The Docker engine blocks for ~1s per


Hmm that's too bad that the request hangs for a second. Maybe we could implement a workaround in the service to issue N requests in parallel and return it to the client, but the current API doesn't support that so I think this is OK for now

Latest iteration does batches of requests in parallel so we can handle the common case very rapidly but protect against extreme cases of very large container counts that might overly stress the endpoints. I put in a batch size of 10 by default but its configurable.

…atch

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

+
+#include <future>
+#include <optional>


+template <typename TItem, typename TWork, typename TSuccess, typename TError>
+void ForEachAsync(const std::vector<TItem>& items, TWork onWork, TSuccess onSuccess, TError onError, size_t batchSize = 10)
+{
+    WI_ASSERT(batchSize > 0);


+    struct BatchResult
+    {
+        TItem item;
+        std::optional<TResult> result;
+        wil::ResultException error{S_OK};
+        bool hasError{false};
+    };


+    wsl::windows::wslc::ForEachAsync<std::wstring>(
+        containers,
+        // Work
+        [&session](const std::wstring& containerId) {
+            return ComputeContainerStatsJson(ContainerService::Stats(session, WideToMultiByte(containerId)));
+        },


+    if (onlineCpus == 0)
+    {
+        onlineCpus = stats.cpu_stats.cpu_usage.percpu_usage.has_value()
+                         ? static_cast<uint32_t>(stats.cpu_stats.cpu_usage.percpu_usage->size())
+                         : 1u;


dkbennett added 2 commits May 22, 2026 09:27

Stats: fix docker reporting online cpus as 0

3c75f38

Fix precpu_stats not being populated in docker engine reponse

453014d

dkbennett requested a review from a team as a code owner May 22, 2026 18:00

Copilot AI review requested due to automatic review settings May 22, 2026 18:00

Copilot started reviewing on behalf of dkbennett May 22, 2026 18:00 View session

Use en-US spelling in comment instead of en-GB

973f2d4

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread src/windows/wslcsession/DockerHTTPClient.cpp

Update stats to run container requests in parallel, and test it

beacc5f

OneBlue reviewed May 27, 2026

View reviewed changes

Add async execution helper, tests, and refactor stats to process in b…

4fa9830

…atch

Copilot AI review requested due to automatic review settings May 27, 2026 21:12

Copilot started reviewing on behalf of dkbennett May 27, 2026 21:12 View session

Formatting

3406bac

Copilot AI reviewed May 27, 2026

View reviewed changes

Remove potentially unreliable test

f1aaacb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: Stats command - fix incorrect CPU % reporting#40627

CLI: Stats command - fix incorrect CPU % reporting#40627
dkbennett wants to merge 7 commits into
masterfrom
user/dkbennett/statsfix

dkbennett commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

OneBlue May 27, 2026

Uh oh!

OneBlue May 27, 2026

Uh oh!

dkbennett May 27, 2026

Uh oh!

OneBlue May 27, 2026

Uh oh!

dkbennett May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		#include <future>
		#include <optional>

Conversation

dkbennett commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the Pull Request

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

OneBlue May 27, 2026

Choose a reason for hiding this comment

Uh oh!

OneBlue May 27, 2026

Choose a reason for hiding this comment

Uh oh!

dkbennett May 27, 2026

Choose a reason for hiding this comment

Uh oh!

OneBlue May 27, 2026

Choose a reason for hiding this comment

Uh oh!

dkbennett May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dkbennett commented May 22, 2026 •

edited

Loading