0

I got a hint to use optional requirements and conditional import to provide a function that can use pandas or not, depending whether it's available. See here for reference:
https://stackoverflow.com/a/74862141/10576322

This solution works, but if I test this code I get always a bad coverage since I either have pandas imported or not. So even if I configure hatch to create environments for both tests, it looks like the tests don't cover this if/else function definition sufficiently.

Is there a proper way around to eg. combine the two results? Or can I tell coverage that the result is expected for that block of code?

Code

The module is looking like that:

try:
    import pandas as pd
    PANDAS_INSTALLED = True
except ImportError:
    PANDAS_INSTALLED = False


if PANDAS_INSTALLED:
    def your_function(...):
        # magic with pandas
        return output
else:
    def your_function(...):
        # magic without pandas
        return output

The idea is that the two version of the two functions work exactly the same beside the inner procedures. So everybody no matter where can use my_module.my_function and don't need to start writing code depending on what environment they are on.

The same is true for testing. I can write tests for my_module.my_function and if the venv has pandas I am testing one part of it and if not the test is testing the other part.

from mypackage import my_module


def test_my_function:
    res = 'foo'
    assert my_module.my_function() == res

That is working fine, but coverage evaluation is complicated.

Paths to solution

Till now I am ware of two solutions.

1. mocking the behavior

@TYZ suggested to have always pandad as dependency for testing and mock the global variable.
I tried that, but it didn't work as I expected it. The reason is that I can of course mock the PANDAS_INSTALLED variable, but the function defifintion already took place during import and is not affected anymore by the variable.
I tried to check if I can mock the import in another test module, but didn't succeed.

2. defining venvs w and w/o pandas and combine results

I found that coverage and pytest-cov have the abillity to append test results between environments or combine different results.
In a first test I changed the pytest-cov script in hatch to include --cov-append. That worked, but it's totally global. That means if I get complete coverage in Python 3.8, but for whatever reason the switch doesn't work in Python 3.9 I wouldn't see it.

What I like to do is to combine the different results by some logic inherited from hatchs test.matrix. Like coverage combine py38.core py38.pandas and the same for 3.9. So I would see if I have same coverage for all tested versions.

I guess that there are possibly solutions to do that with tox, but maybe I don't need to include another tool.

FordPrefect
  • 320
  • 2
  • 11
  • 1
    In my opinion, since in the link you shared, the control of which function to use is defined by a global variable, in your test case, you can overwrite that global variable to control which function to use rather than depending that on your environment setup. – TYZ Dec 20 '22 at 20:55
  • Ah thats in interesting option. So I could set up the environment to have pandas and controll test coverage over override of global variable. I will need to look that up. And in order to have pandas only in optional deps I can add it to requirements for test. – FordPrefect Dec 20 '22 at 22:04
  • 1
    That's true, I would do that if I were you. – TYZ Dec 21 '22 at 00:58
  • @TYZ I tried it out, but the problem with mocking the global variable is that the function is defined during import. Mocking it later on has no effect anymore. I thought of creating a new test module and import my module there with an override, but i don't found a wayto do that yet. – FordPrefect Dec 22 '22 at 08:25

1 Answers1

1

Updated: Added the 'built-in' manner to do this at the bottom

If it is a test case you're writing, shouldn't the behavior you're testing be the same regardless of whether pandas is installed or not ? From the original question it appears like you'd have the function defined anyways. The intent of your unit test then ought to be -- "given these parameters test whether return value/behavior is this".

That said, if you want coverage with or without pandas, my recommendation would be to declare 2 differently named functions (which can be imported and unit tested separately), whereas your runtime function is assigned depending on the flag in the import block. Something like:


# your_code.py
try:
    import pandas as pd
    PANDAS_INSTALLED = True
except ImportError:
    PANDAS_INSTALLED = False

def _using_pandas(...):
    ...

def _not_using_pandas(...):
    ....

do_something = _using_pandas if PANDAS_INSTALLED else _not_using_pandas    

__all__ = ['do_something']

# -------------
# your_tests.py

try:
    import pandas as pd
    PANDAS_INSTALLED = True
except ImportError:
    PANDAS_INSTALLED = False

from your_code import _using_pandas, _not_using_pandas, do_something
import pytest

@pytest.mark.skipif(not PANDAS_INSTALLED)
def test_code_using_pandas(...):
    ...

@pytest.mark.skipif(PANDAS_INSTALLED)
def test_code_not_using_pandas(...):
    ...

def test_do_something(...):
    # test behavior independent of imports
    ...


Update: Seems like there is a built-in pytest.importorskip mark that can be used to decorate your tests. This would get rid of the boilerplate for the flag and replace the skipif mentioned above.

lonetwin
  • 971
  • 10
  • 17
  • Thanks for reply. I will try to edit question to make it more obvious. – FordPrefect Dec 22 '22 at 12:57
  • I understand your idea to define functions seperately and do the evaluation afterwards. But will the definition of function with pandas not run into error when pandas is missing? I am also afraid that users might import the with and withou functions directly and run into problems. – FordPrefect Dec 22 '22 at 13:18
  • 1
    @FordPrefect the function `using_pandas` will potentially run into an error if called directly in an environment that doesn't have pandas and that's the reason why it should never be called directly. Your users should always call `do_something` which references the correct function based on import time decisions. Remember that python is a dynamically typed language with first-class functions, so `do_something` *is a function* after the import. – lonetwin Jan 04 '23 at 18:10
  • Yes, you are right. I decided to follow your idea and did some tests with it. I used __init__ zo highlight the functions intended to face the user. – FordPrefect Jan 05 '23 at 06:36