How do I speed up my GitHub PyTest executions?

Question

I have a repository of several hundred tests that have been fast enough up to now, but as we continue to grow the codebase I worry that it will get so slow that my team will be caught up in waiting for CI runs to complete.

What can I do to speed this up and make my tests faster both in the short run and in the long run?

I need to consider:

Scalability
Cost
Rollout

vanhooser · Answer 1 · 2022-12-22T20:03:53.680

We can speed up the test runs using horizontal and vertical scaling. To get there, we need to make our tests parallel safe. We have some other PyTest issues we must work around to accomplish this. We also can be clever how we roll out adoption of parallelization for tests that are difficult to make parallel-safe.

Let's dig in.

⚖️ Parallel-Safe

Tests in the past may have been written to assume serial execution, namely that DB state existed in a certain way before a test was run. This means that different orders of execution may start failing non-deterministically. You will have to ensure that every test creates DB state that is scoped specifically to your test, ensures setup of all necessary objects, and (optionally) tears down these objects once the test is done. Fixtures will be your friend here as they can be useful for creating necessary DB state and cleaning up after.

One antipattern in serial execution can be asserts based on counts of rows in the DB. i.e.:

def test_1() -> None: 
  create_some_rows_in_db()
  assert get_rows_in_db() == 1

def test_2() -> None: 
  create_some_more_rows_in_db()
  assert get_rows_in_db() == 2

If we were to run these tests in a different order, they would fail. We need instead to create rows in the DB that exactly correspond to just our test session, and similarly we need to fetch rows from the DB that are only for this test session.

def test_1() -> None: 
  scope=uuid4()
  create_some_rows_in_db(scope=scope)
  assert get_rows_in_db(scope=scope) == 1

def test_2() -> None: 
  scope=uuid4()
  create_some_more_rows_in_db(scope=scope)
  assert get_rows_in_db(scope=scope) == 1

Consistently Ordered

There are two ways in which test order can be corrupted: the test name may change, and test order is not sorted by the name by default.

If you derive values like UUIDs in parameterized tests, these values change between test runs which will mean the name of the test itself will change. This means that when running tests in parallel, their names will be different and PyTest will fail to collect. Luckily, it's straightforward to remove the creation of parameterized arguments that change between runs.

Concretely, if you have tests initially that look like:

@pytest.mark.parametrize("my_arg,...", [(uuid4(), ...), (uuid4(), ...)])
def test_some_code(my_arg: uuid4, ...) -> None:
  assert my_arg is not None

Then you will need to change it to derive the argument inside the function.

@pytest.mark.parametrize("...", [(...),])
def test_some_code(...) -> None:
  my_arg = uuid4()
  assert my_arg is not None

Next, we also need to patch the collection order of parametrized tests, which means we add the following to our conftest.py:

def pytest_collection_modifyitems(items: list[pytest.Item]) -> None:
    def param_part(item: pytest.Item) -> str:
        # find the start of the parameter part in the nodeid
        index = item.nodeid.find("[")
        if index > 0:
            # sort by parameter name
            parameter_index = item.nodeid.index("[")
            return item.name[parameter_index:]

        # for all other cases, sort by node id as usual
        return item.nodeid

    # re-order the items using the param_part function as key
    items[:] = sorted(items, key=param_part)

↕️ Vertical Scaling

Next, we can run our tests in parallel in a single GitHub Action Runner using xdist. The installation and configuration of this package is straightforward to accomplish, and GitHub Action Runners by default have 2 cpus available for us to take advantage of.

In the future, it will be possible to scale up the size of the machines running these tests. For now, 2 cores offers us a decent speedup. We can go further however.

↔️ Horizontal Scaling

Vertical scaling offered a decent speedup, but what we really want to accomplish is a split of our test work across multiple runners. Luckily, PyTest-split accomplishes this for us brilliantly.

It's quite simple to enable in your workflow .yml as demonstrated here, and when combined with GitHub Matrix Actions, we can tell PyTest to run in parallel a fractional share of all the available tests.

This means each runner receives all the tests but chooses to run a split of the tests, thus leaving the remainder for the other runners to execute. It is now trivial to add or remove the number of runners in the matrix argument, and we can scale up or down the number of parallel executions to match our SLA and budget.

I'd recommend also using the test_duration functionality of PyTest-split so that you tune the allocation of your tests in each runner so they are evenly balanced.

Speaking of budget...

Cancel Previous

If we want to be careful about costs, it's advantageous to cancel runs of previous commits, if they are still executing, as demonstrated here. This will let us recover costs from the now more expensive execution costs of each commit. I'd recommend you start with a small matrix of workers and see what costs you are comfortable taking on, then adding as necessary to meet your turnaround time needs.

Adoption

Let's say we don't have the time or resources to migrate all of our tests to being parallel-safe. If we want to offer an escape-hatch to our developers in case they just want to run the test in serial every time, we can use a clever marking of tests using pytest.mark.serial to ensure certain tests are run in the same order every time. This means we will need to configure our GitHub workflow .yml to execute these tests separately from our Matrix runs, but this is straightforward to implement.

...
# Serial Execution 
pytest -vv -x -n 0 -m "serial"

...
# Parallel Execution
pytest -vv -n auto -m "not serial" --splits PARALLELISM --group ${{ matrix.group }}

Then, we can simply mark our tests we wish to keep in serial operation like so:

@pytest.mark.serial
def my_not_parallel_safe_test() -> None:
  pass

⚡️ Summary

We now have parallel-safe tests, that can be adopted over time as engineering resources allow, with vertical and horizontal scaling capabilities, while being budget-conscious.

Cheers