I am new to pytest so I might use some pytest semantics incorrectly.
In general, I am having the following issue:
I am using mark.parametrize to do the mocking at a test, and when I use the same variable in an argument, mocking is using the data of the previous run instead of what I specify.
Analytically:
At the first 'iteration', in the mark.parametrize I am using mock_data_1 to mock the GetData.get_data(). Then, the test as I would expect mocks the data
here: data = GetData.get_data()
and afterwards it adds a new column to the data data['new_col0']
.
At the second 'iteration', where in the mark.parametrize I am using again mock_data_1, instead of having a new fresh set of mock_data_1, the test uses the previous data, containing the extra column.
These are some sample files:
file.py
from test_file_get_data import GetData
class MyClass:
def new_dataset(arg):
data = GetData.get_data(arg) # Mock this part
data[f'new_col{arg}'] = arg # New column to data
return data
test_file.py
from file import MyClass
import pandas as pd
import pytest
class TestMyClass:
mock_data_1 = pd.DataFrame({"col_1": [1,2,3]})
arg_1 = 0
arg_2 = 1
output_1 = pd.DataFrame({"col_1": [1,2,3], "new_col0": [0,0,0]})
output_2 = pd.DataFrame({"col_1": [1,2,3], "new_col1": [1,1,1]})
@pytest.mark.parametrize(
'mock_arguments, arg, result',
[
(mock_data_1, arg_1, output_1),
(mock_data_1, arg_2, output_2)
]
)
def test_new_dataset(self, mocker, mock_arguments, arg, result):
mocker.patch(
'file.GetData.get_data',
return_value=mock_arguments,
)
print(mock_arguments)
res = MyClass.new_dataset(arg)
print(res)
assert res.to_dict() == result.to_dict()
test_file_get_data.py
import pandas as pd
class GetData:
def get_data(arg):
data = pd.DataFrame({"a":[1, 2, 3]})
return data
So the first test passes, but the second one fails because the data returned is this:
{'col_1': {1, 2, 3},
'new_col0': {0, 0, 0},
'new_col1': {1, 1, 1}}
instead of this:
{'col_1': {1, 2, 3},
'new_col1': {1, 1, 1}}
This issue can be solved if I replace data = GetData.get_data()
with data = GetData.get_data().copy()
, but I am assuming I am doing something wrong in the tests.
Shouldn't the data be refreshed and/or deleted after every iteration? Or what is happening is an expected behavior?