Regression stability + new regression tests by scarlehoff · Pull Request #2480 · NNPDF/nnpdf

scarlehoff · 2026-06-03T13:13:27Z

Add a test for vp-setupfit
Improve the stability of the regression tests
Include a theory covmat test
Generate the data in a dedicated worker
Set the tolerance with respect to said worker

…-eko-download for debug purposes

…_stability.

Radonirinaunimi · 2026-06-15T15:23:10Z

I'm afraid self-hosting is not a possibility, the difference is too big and I don't think we have the infrastructure to run the regression tests on the self-hosted runner for every commit :(

Let's see whether I can find a tolerance value for which is acceptable and for which the regressions don't need to be redone that often.

Why don't we then use a container? This would allow use to have infinite reproducibility?

scarlehoff · 2026-06-15T15:25:10Z

We are effectively running in a container. I haven't tested, but if the differences are on the underlying hardware, the container won't fix it.

Radonirinaunimi · 2026-06-15T15:42:12Z

But is it really at the level of the hardware though? Not at the level of system packages, compilers, etc.?

scarlehoff · 2026-06-15T15:45:30Z

Well, unless the ubuntu-latest means different things, yes. If you get two logs from two workers that generate the differences the only difference is the AWS region in which they are running (I don't have enough statistics to know whether it would happen in the same region or whether it would work in different regions though, but it's what I have... so the only way around is a tolerance...)

Radonirinaunimi · 2026-06-15T16:51:27Z

Selecting ubuntu-latest for example only ensures OS upgrades but on top of this, GitHub continuously updates the runner images themselves (system packages, compiler, tools, etc.). And I am pretty sure these are the causes of the non-reproducibility. And this is really the main reasons (on top of LHAPDF) we run regressions in container in PineAPPL and NeoPDF for example. I could try to give a look whenever I have some time.

scarlehoff · 2026-06-15T18:18:44Z

The non-reproducibility here happens non deterministically though. But indeed, it could be updates that made one azure region be different from the other

…_stability.

scarlehoff · 2026-06-16T07:24:36Z

Ok, for the time being generating the reference in a custom worker and defining the tolerances with respect to it seems to work. I can try to make it run a few more times to get more statistics.

@Radonirinaunimi if you have time to hace a quick look at this it would be very much appreciated. I'm not very happy with the tolerances and if that can be fixed with a docker image please do so. But at the moment I'm more worried about having regressions that we can rely on (so that people don't learn to simply ignore the X).

scarlehoff added the redo-regressions Recompute the regression data label Jun 3, 2026

scarlehoff force-pushed the regression_stability branch from b2fd1ad to b1fc97f Compare June 3, 2026 13:20

scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026

scarlehoff force-pushed the regression_stability branch from b1fc97f to e3ae1d8 Compare June 3, 2026 13:42

scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026

scarlehoff force-pushed the regression_stability branch from a78efb4 to 9979d66 Compare June 3, 2026 14:07

scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026

scarlehoff force-pushed the regression_stability branch from a727524 to 0b6df3c Compare June 3, 2026 18:59

scarlehoff added buildmaster redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data buildmaster labels Jun 3, 2026

scarlehoff force-pushed the regression_stability branch from ef46748 to bd1f82d Compare June 3, 2026 19:38

scarlehoff added redo-regressions Recompute the regression data devtools Build, automation and workflow and removed redo-regressions Recompute the regression data labels Jun 3, 2026

scarlehoff added 4 commits June 4, 2026 08:20

test for changed files on setupfit

ec8b332

bugfix for vp-setupfit where a second loader was being used; add a no…

c89ad76

…-eko-download for debug purposes

add a starting point for hyperopt runcard

38a9871

automatically regenerate also the setupfit files

22cd99e

scarlehoff force-pushed the regression_stability branch from 882ce46 to 32b2e7a Compare June 4, 2026 07:04

scarlehoff added devtools Build, automation and workflow and removed devtools Build, automation and workflow labels Jun 4, 2026

scarlehoff and others added 2 commits June 4, 2026 12:05

add a min delta parameter to patience to stabilize a bit the regression

b3fd3aa

Automatically regenerated regressions from PR 2480, branch regression…

3dd49bb

…_stability.

scarlehoff force-pushed the regression_stability branch from 22afeea to 3dd49bb Compare June 4, 2026 10:05

docs update; change the delta for hyperopt and polarized

e6451da

scarlehoff force-pushed the regression_stability branch from 6332004 to 34b6ca3 Compare June 15, 2026 12:39

scarlehoff added the redo-regressions Recompute the regression data label Jun 15, 2026

Automatically regenerated regressions from PR 2480, branch regression…

ea85d1a

…_stability.

allow for a tolerance for setupfit generated covmats

38c73b3

scarlehoff force-pushed the regression_stability branch from 2b4f85b to 38c73b3 Compare June 15, 2026 15:43

scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 15, 2026

scarlehoff force-pushed the regression_stability branch from a15bd73 to 0f72a79 Compare June 15, 2026 21:47

scarlehoff added the devtools Build, automation and workflow label Jun 15, 2026

scarlehoff force-pushed the regression_stability branch from 0f72a79 to aa00337 Compare June 15, 2026 22:10

scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 15, 2026

allow for overrides when sampling from hyperopt

77b5bda

scarlehoff force-pushed the regression_stability branch from aa00337 to 77b5bda Compare June 15, 2026 22:34

scarlehoff added redo-regressions Recompute the regression data and removed devtools Build, automation and workflow redo-regressions Recompute the regression data labels Jun 15, 2026

Redo regressions bot and others added 2 commits June 15, 2026 22:54

Automatically regenerated regressions from PR 2480, branch regression…

6d8a28a

…_stability.

up the tolerance for hyperopt

423492e

scarlehoff force-pushed the regression_stability branch from ed3d7e2 to 423492e Compare June 15, 2026 23:12

scarlehoff requested a review from Radonirinaunimi June 16, 2026 07:22

blur the plots so they are more robust

46d64bb

scarlehoff marked this pull request as ready for review June 16, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression stability + new regression tests#2480

Regression stability + new regression tests#2480
scarlehoff wants to merge 17 commits into
masterfrom
regression_stability

scarlehoff commented Jun 3, 2026 •

edited

Loading

Uh oh!

Radonirinaunimi commented Jun 15, 2026

Uh oh!

scarlehoff commented Jun 15, 2026

Uh oh!

Radonirinaunimi commented Jun 15, 2026

Uh oh!

scarlehoff commented Jun 15, 2026 •

edited

Loading

Uh oh!

Radonirinaunimi commented Jun 15, 2026

Uh oh!

scarlehoff commented Jun 15, 2026 •

edited

Loading

Uh oh!

scarlehoff commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scarlehoff commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Radonirinaunimi commented Jun 15, 2026

Uh oh!

scarlehoff commented Jun 15, 2026

Uh oh!

Radonirinaunimi commented Jun 15, 2026

Uh oh!

scarlehoff commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Radonirinaunimi commented Jun 15, 2026

Uh oh!

scarlehoff commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scarlehoff commented Jun 3, 2026 •

edited

Loading

scarlehoff commented Jun 15, 2026 •

edited

Loading

scarlehoff commented Jun 15, 2026 •

edited

Loading