Pytest- use a generator for mark.parametrize

Question

I have a mongo db with a very large collection that I need to run tests on with Pytest. I am trying to do it the usual route of using the mark.parametrize dectorator but with pymongo.cursor Cursor object:

def get_all_data():
    return db["collection"].find({}) # query to retrieve all documents from the collection

@pytest.mark.parametrize("doc", get_all_data())
def test_1(doc):
    assert doc["val"] == 1
    ....

The problem with this code is pytest in the collection stage before running tests automatically converts the generator into a list. I don't want this because of 2 reasons:

This is very slow due to the fact the collection is very large.
Stack overflow- Not enough RAM to load all of this data anyway.

Meaning I cannot use mark.parametrize, however how can I still use a generator to run tests 1 at a time and not to load everything immediately into memory? Is it even possible with Pytest?

it is not, during the collection phase pytest must know all of the tests that are to be run before continuing onwards — anthony sottile, Nov 28 '21 at 16:23
If so, are there other tools I can use to test very large datasets? (much larger than my RAM) — tHeReaver, Nov 28 '21 at 18:29
Yes. I am using millions of documents to test if an algorithmic component is working properly — tHeReaver, Nov 28 '21 at 18:51
how can you possibly know if it's correct if there's millions of inputs — anthony sottile, Nov 28 '21 at 18:57
I do regression testing, so the new version is always compared either to a previous one or a golden standard one. This creates a situation where my database is populated by varying vectors from all the stable release versions. — tHeReaver, Nov 28 '21 at 19:58
you're not really running "tests" at that point but an integration pipeline -- for which a unit testing tool (such as pytest) is not going to be helpful — anthony sottile, Nov 28 '21 at 20:00

score 1 · Accepted Answer · answered Nov 29 '21 at 07:20

1

I can think of this workaround - write a fixture to pass the generator to a single test. Then check each entry individually in the same test using pytest-check (because i guess you need to assert each entry separately and continue even if some entries fail).

@pytest.fixture
def get_all_data():
    yield db["collection"].find({})

def test_1(get_all_data):
    for each in get_all_data:
        check.is_(each["val"], 1)

answered Nov 29 '21 at 07:20

Shod

801
3
12
33

Maybe you could add `check` fixture into the test function signature (`def test_1(get_all_data, check)`) so that it's clear where it comes from. – tmt Nov 29 '21 at 10:49
1

Yeah that's true for fixtures in general. Pytest does a lot of magic.. For `check` though, it's an import from `pytest_check` – Shod Nov 29 '21 at 11:15

Pytest- use a generator for mark.parametrize

1 Answers1