Pattern for reading and writing files with pytest

Question

constants.py

import os

BASE_PATH = os.path.abspath(os.path.dirname(__file__))
INPUT_PATH = os.path.join(BASE_PATH, 'input')
FILE_INPUT1_PATH = os.path.join(INPUT_PATH, 'input1.csv')
FILE_INPUT2_PATH = os.path.join(INPUT_PATH, 'input2.csv')
PROCESSED_PATH = os.path.join(BASE_PATH, 'processed')
FILE_PROC1_PATH = os.path.join(PROCESSED_PATH, 'processed1.pkl')
FILE_PROC2_PATH = os.path.join(PROCESSED_PATH, 'processed2.pkl')

structure dir:

root
  |__ constant.py
  |__ input
          |__ input1.csv
          |__ input2.csv 
  |__ process
          |__ processed1.pkl
          |__ processed2.pkl

data_handling.py

from constants import FILE_INPUT1_PATH, FILE_INPUT2_PATH, FILE_PROC1_PATH, FILE_PROC2_PATH

def foo(*args):
    file = FILE_INPUT1_PATH
    # Here it is doing staff
    # Finally I write data into FILE_PROC1_PATH

def bar(*args):
    file = FILE_INPUT2_PATH
    # Here it is doing staff
    # Finally I write data into FILE_PROC2_PATH

Currently I'm trying to use pytest and testing foo() and bar() but I don't know how to proceed due to input files and processed files are too big and test process musn't override processed files. One approach is to change definition bar() to bar(path) and then call bar(FILE_INPUT2_PATH) but that it isn't make sense in the code because bar always needs to read FILE_INPUT2_PATH and it is called in many places. Unit test for foo() and bar() would test if the processed files were created or not because it depends on *args.

So... question is how can I solve it? Does a pattern/good practice exists for this case? What should I change in my code?

One option is to [patch file path constants](https://stackoverflow.com/questions/27252840/how-to-patch-a-constant-in-python) to point to files with test data. — 9dogs, Dec 22 '17 at 19:14
@9dogs thank you! It is what I wanted. If can post the answer — Cristhian Boujon, Dec 27 '17 at 18:01

score 4 · Accepted Answer · answered Dec 28 '17 at 08:51

input files and processed files are too big and test process musn't override processed files

Yes, and tests are perfectly suitable for that kind of job. Generic approach is to create a test data (which can be a subset of original data with edge cases included) and place it somewhere near your tests, for example:

├───tests
│   │   test_bar.py
│   │   test_foo.py
│   │
│   └───data
│           input_1.dat
│           input_2.dat
│           expected_1.pkl
│           expected_2.pkl

Then, if testing functions accept input as a constant rather than a parameter, use unittest.mock.patch to change constant in test run (see this excellent answer for a quick reference). For storing output either regular or temporary file can be used.

import tempfile
from pathlib import Path
from unittest.mock import patch

import foo_module


TEST_DATA_DIR = Path(__file__).resolve().parent / 'data'


@patch('foo_module.FILE_INPUT1_PATH', TEST_DATA_DIR / 'input_1.dat')
@patch('foo_module.FILE_PROC1_PATH', tempfile.mktemp())
def test_foo(tmpdir):
    """Process input and check result."""
    foo_module.foo()
    result = open(foo_module.FILE_PROC1_PATH, 'rb').read()
    expected = open(TEST_DATA_DIR / 'expected_1.pkl', 'rb').read()
    assert result == expected

NOTE: tempfile.mktemp() is deprecated because file is not created on mktemp() call thus can be locked by another process. Feel free to suggest alternative approach.

Pattern for reading and writing files with pytest

1 Answers1