Regression stability + new regression tests#2480
Conversation
b2fd1ad to
b1fc97f
Compare
b1fc97f to
e3ae1d8
Compare
a78efb4 to
9979d66
Compare
a727524 to
0b6df3c
Compare
ef46748 to
bd1f82d
Compare
882ce46 to
32b2e7a
Compare
22afeea to
3dd49bb
Compare
6332004 to
34b6ca3
Compare
Why don't we then use a container? This would allow use to have infinite reproducibility? |
|
We are effectively running in a container. I haven't tested, but if the differences are on the underlying hardware, the container won't fix it. |
|
But is it really at the level of the hardware though? Not at the level of system packages, compilers, etc.? |
2b4f85b to
38c73b3
Compare
|
Well, unless the |
|
Selecting |
|
The non-reproducibility here happens non deterministically though. But indeed, it could be updates that made one azure region be different from the other |
a15bd73 to
0f72a79
Compare
0f72a79 to
aa00337
Compare
aa00337 to
77b5bda
Compare
ed3d7e2 to
423492e
Compare
|
Ok, for the time being generating the reference in a custom worker and defining the tolerances with respect to it seems to work. I can try to make it run a few more times to get more statistics. @Radonirinaunimi if you have time to hace a quick look at this it would be very much appreciated. I'm not very happy with the tolerances and if that can be fixed with a docker image please do so. But at the moment I'm more worried about having regressions that we can rely on (so that people don't learn to simply ignore the X). |
Fixes #2464