Where to store cross-testrun state in pytest (binary files)?

Question

I have a session-level fixture in pytest that downloads several binary files that I use throughout my test suite. The current fixture looks something like the following:

@pytest.fixture(scope="session")
def image_cache(pytestconfig, tmp_path_factory):
    # A temporary directory loaded with the test image files downloaded once.

    remote_location = pytestconfig.getoption("remote_test_images")
    tmp_path = tmp_path_factory.mktemp("image_cache", numbered=False)
    
    # ... download the files and store them into tmp_path

    yield tmp_path

This used to work well, however, now the amount of data is making things slow, so I wish to cache it between test runs (similar to this question). Contrary to the related question, I want to use pytests own cache for this, i.e., I'd like to do something like the following:

@pytest.fixture(scope="session")
def image_cache(request, tmp_path_factory):
    # A temporary directory loaded with the test image files downloaded once.

    remote_location = request.config.option.remote_test_images

    tmp_path = request.config.cache.get("image_cache_dir", None)
    if tmp_path is None:
        # what is the correct location here?
        tmp_path = ...
        request.config.cache.set("image_cache_dir", tmp_path)

        # ... ensure path exists and is empty, clean if necessary

        # ... download the files and store them into tmp_path

    yield tmp_path

Is there a typical/default/expected location that I should use to store the binary data?
If not, what is a good (platform-independent) location to choose? (tests run on the three major OS: linux, mac, windows)

I think you answered your own questions -- if you use the pytest cache you linked to then it will remain platform agnostic as pytest will handle the overhead of setting/getting, no? — Teejay Bruno, Jan 14 '22 at 16:15
@TeejayBruno The python cache is only a store for json-able objects (which the binary files/images are not). My idea is to have that point to a directory that contains the actual data. The question is where should I create that directory? — FirefoxMetzger, Jan 14 '22 at 17:07

score 0 · Answer 1 · answered Nov 29 '22 at 00:08

A bit late here, but maybe I can still offer a helpful suggestion. You have at least two options, I think:

use pytest's caching solution on its own. Yes, it expects JSON-serializable data, so you'll need to convert your "binary" into strings. You can use base64 to safely encode arbitrary binary into letters that can be stored as a string and then later converted from a string back into the original binary (ie back into an image, or whatever).
use pytest's caching solution as a means of remembering a filename or a directory name. Then, use python's generic temporary file system so you can manage temporary files in a platform-independent way. In the end, you'll write just filenames or paths into pytest cache and do everything else manually.

Solution 1 benefits from fully-automatic management of cache content and will be compatible with other pytest extensions (such as distributed testing with xdist). It means that all files are kept in the same place and it is easier to see and manage disk usage.

Solution 2 is likely to be faster and will scale safely. It avoids the need to transcode to base64, which will be a waste of CPU and of space (since base64 will take a lot more space than the original binary). Additionally, the pytest cache may not be well suited for a large number of potentially large values (depending on the number of files and sizes of images we're talking about here).

Converting image / arbitrary bytes into something that can be encoded to JSON:

import base64
now_im_a_string = base64.encodebytes(your_bytes)
... 
# cache it, store it whatever
... 
im_your_bytes_again = base64.decodestring(string_read_from_cache)

Interesting approach. What we've ended up settling for is to create a `.test_images` folder inside the project's root into which images are downloaded. This follows the philosophy of pytest's `.pytest_cache`, vscode's `.vscode`, poetry's in-project `.venv`, etc. So far, this does the job fine and you can find the [fixture here](https://github.com/imageio/imageio/blob/063346495147e995583ec42de3de8b3d007ae724/tests/conftest.py#L69-L112). — FirefoxMetzger, Nov 29 '22 at 10:40

Where to store cross-testrun state in pytest (binary files)?

1 Answers1