Skip to content

Regression stability + new regression tests#2480

Open
scarlehoff wants to merge 17 commits into
masterfrom
regression_stability
Open

Regression stability + new regression tests#2480
scarlehoff wants to merge 17 commits into
masterfrom
regression_stability

Conversation

@scarlehoff

@scarlehoff scarlehoff commented Jun 3, 2026

Copy link
Copy Markdown
Member
  • Add a test for vp-setupfit
  • Improve the stability of the regression tests
  • Include a theory covmat test
  • Generate the data in a dedicated worker
  • Set the tolerance with respect to said worker

Fixes #2464

@scarlehoff scarlehoff added the redo-regressions Recompute the regression data label Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from b2fd1ad to b1fc97f Compare June 3, 2026 13:20
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from b1fc97f to e3ae1d8 Compare June 3, 2026 13:42
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from a78efb4 to 9979d66 Compare June 3, 2026 14:07
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from a727524 to 0b6df3c Compare June 3, 2026 18:59
@scarlehoff scarlehoff added buildmaster redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data buildmaster labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from ef46748 to bd1f82d Compare June 3, 2026 19:38
@scarlehoff scarlehoff added redo-regressions Recompute the regression data devtools Build, automation and workflow and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from 882ce46 to 32b2e7a Compare June 4, 2026 07:04
@scarlehoff scarlehoff added devtools Build, automation and workflow and removed devtools Build, automation and workflow labels Jun 4, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from 22afeea to 3dd49bb Compare June 4, 2026 10:05
@scarlehoff scarlehoff force-pushed the regression_stability branch from 6332004 to 34b6ca3 Compare June 15, 2026 12:39
@scarlehoff scarlehoff added the redo-regressions Recompute the regression data label Jun 15, 2026
@Radonirinaunimi

Copy link
Copy Markdown
Member

I'm afraid self-hosting is not a possibility, the difference is too big and I don't think we have the infrastructure to run the regression tests on the self-hosted runner for every commit :(

Let's see whether I can find a tolerance value for which is acceptable and for which the regressions don't need to be redone that often.

Why don't we then use a container? This would allow use to have infinite reproducibility?

@scarlehoff

Copy link
Copy Markdown
Member Author

We are effectively running in a container. I haven't tested, but if the differences are on the underlying hardware, the container won't fix it.

@Radonirinaunimi

Copy link
Copy Markdown
Member

But is it really at the level of the hardware though? Not at the level of system packages, compilers, etc.?

@scarlehoff scarlehoff force-pushed the regression_stability branch from 2b4f85b to 38c73b3 Compare June 15, 2026 15:43
@scarlehoff

scarlehoff commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Well, unless the ubuntu-latest means different things, yes. If you get two logs from two workers that generate the differences the only difference is the AWS region in which they are running (I don't have enough statistics to know whether it would happen in the same region or whether it would work in different regions though, but it's what I have... so the only way around is a tolerance...)

@Radonirinaunimi

Copy link
Copy Markdown
Member

Selecting ubuntu-latest for example only ensures OS upgrades but on top of this, GitHub continuously updates the runner images themselves (system packages, compiler, tools, etc.). And I am pretty sure these are the causes of the non-reproducibility. And this is really the main reasons (on top of LHAPDF) we run regressions in container in PineAPPL and NeoPDF for example. I could try to give a look whenever I have some time.

@scarlehoff

scarlehoff commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

The non-reproducibility here happens non deterministically though. But indeed, it could be updates that made one azure region be different from the other

@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 15, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from a15bd73 to 0f72a79 Compare June 15, 2026 21:47
@scarlehoff scarlehoff added the devtools Build, automation and workflow label Jun 15, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from 0f72a79 to aa00337 Compare June 15, 2026 22:10
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 15, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from aa00337 to 77b5bda Compare June 15, 2026 22:34
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed devtools Build, automation and workflow redo-regressions Recompute the regression data labels Jun 15, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from ed3d7e2 to 423492e Compare June 15, 2026 23:12
@scarlehoff

Copy link
Copy Markdown
Member Author

Ok, for the time being generating the reference in a custom worker and defining the tolerances with respect to it seems to work. I can try to make it run a few more times to get more statistics.

@Radonirinaunimi if you have time to hace a quick look at this it would be very much appreciated. I'm not very happy with the tolerances and if that can be fixed with a docker image please do so. But at the moment I'm more worried about having regressions that we can rely on (so that people don't learn to simply ignore the X).

@scarlehoff scarlehoff marked this pull request as ready for review June 16, 2026 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redo-regressions Recompute the regression data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve reproducibility of regression tests (and add one with a theory covmat)

2 participants