I'm using xdist in order to accelerate my pytest execution. Everything seemed to work fine until I recently tried to run pytest with a different number of workers.
- pytest -n 8 gives 36 failed tests
- pytest -n 9 gives 37 failed tests
- pytest -n 11 yields 38 failed tests
- pytest -n 12 yields 36 failed tests
The there is no randomness in the code, and running these different pytest commands reliably reproduces the corresponding number of failing tests. There are also no time constraints in the code, like 'xyz has to happen within this amount of time'.
I'm confused and my trust in the pytest results has been reduced. How can I find out the reason for this unwanted behaviour and get rid of it without switching back to undistributed pytest execution?
plugins: forked-1.3.0, xdist-2.5.0, anyio-3.5.0
I checked the 'extra' failing test cases and found out, that some tests which should fail are not failing and others that should not fail are failing.
pytest -n auto runs with 8 workers and same amount of failed tests, but one tests that fails with pytest -n 8 and pytest -n auto fails "differently" (the condition checked in the assertion has different values) running pytest -n 8 again gives the exact same failed tests and failing assertions / conditions as before.
So it really has to do something with running pytest in different ways..
I also ran just pytest without -n, this also leads to some wrong results. Wrong in the sense of: when I run the code manually, the result is different.