0

I am doing some hypothesis testing on async test. My code create and alter databases real-time, and I'm facing a problem with cleanup.

Basically, most of the time, I can cleanup the database without a problem. The only time when it get a bit messy is when a test doesn't pass, but that's not really a problem as I will fix the code and still have the error, thanks to pytest.

But that's not true with Hypothesis. Here is my test:

@given(st.text(min_size=1, max_size=128))
@pytest.mark.asyncio
async def test_hypothesis_add_column(name):
    assume('\x00' not in name)
    database = await get_database()
    project = await database.create_project('test_project')
    source = await project.create_source(
        'test_source',
        [
            lib.ColumnDefinition(
                name='external_id',
                type=lib.ColumnTypes.NUMERIC,
                is_key=True
            )
        ]
    )
    await source.add_column(lib.ColumnDefinition(
        name=name,
        type=lib.ColumnTypes.TEXT
    ))
    await end_database(database)
    assert len(source.columns) == 2
    assert await source.column(name) is not None
    assert (await source.column(name)).internal_name.isidentifier()

This test raise an error. That's ok - it means there's a bug in my code, so I should fix it. But then, on the next run of hypothesis, there is another error, at another point (basically it cannot do the "create_source" because the database is messed up).

My problem is that hypothesis keep testing stuff AFTER the initial failure, even with report_multiple_bugs=False in my profile. And then it report the bug like this:

 hypothesis.errors.Flaky: Inconsistent test results! Test case was Conclusion(status=Status.INTERESTING, interesting_origin=(<class 'asyncpg.exceptions.PostgresSyntaxError'>, 'asyncpg/protocol/protocol.pyx', 168, (), ())) on first run but Conclusion(status=Status.INTERESTING, interesting_origin=(<class 'asyncpg.exceptions.InternalServerError'>, 'asyncpg/protocol/protocol.pyx', 201, (), ())) on second

And the worst part is that the pytest dump is related to the second test (the InternalServerError one) and I can't find the first test (the PostgresSyntaxError one). My problem is that the information I actually need to debug are the one from the first run - I don't even understand why it keeps trying when there is a fail, especially when I setup that I don't want multiple errors.

Is there a way to make it stop doing it and avoid those "Interesting" cases? I'd rather have the nice and clean explaination from hypothesis.

Thank you !

1 Answers1

0

The quick-and-dirty answer is to adjust the phases setting to exclude Phase.shrink.

The real answer is that to get much out of Hypothesis, you'll need to make sure that running the same input twice has the same behavior, i.e. ensure that you clean up any corrupted database state on failure (e.g. using a context manager). This is more work, sorry, but getting reproducible tests and minimal failing examples is worth it!

Zac Hatfield-Dodds
  • 2,455
  • 6
  • 19
  • I totally agree on the fact that I should ensure that test are reproductible (and I am actually working on that part too!). And the "Interesting" behaviour is really good for that! But the problem for me is with how it report the results: when a behavior isn't reproductible, the test result is basically unusable, making fixing it way harder as you can't know which test case is non-reproductible nor have exact debug unless you are lucky with params past. – Yann PIQUET Jun 12 '22 at 09:23