Dynamically reuse integration testing scenarios in pytest

Question

I have a set of classes for which each property and method must be tested.

Something like

datasets = [{
    "klass": MyClass1,
    "a": 1,
    "c": 3,
},
{
    "klass": MyClass2,
    "a": 12,
    "b": 13,
}]

Note that these are just quick examples, the test data is much more complex and large.

Coming from ruby and the Shoulda framework, before I actually discovered how unwieldy python testing was, I had thought I could simply write some dynamic generator that would allow me to do something like

# tests/test_datasets_client_1.py
for dataset in datasets_client_1:
    generate_test_case(**dataset)

# tests/test_datasets_client_2.py
for dataset in datasets_client_2:
    generate_test_case(**dataset)

Which would really be calling

# tests/test_libraries.py

def generate_test_case(**kwargs):
    # do some stuff to register TestCase(**kwargs) so that it's picked up by the testing framework
    # stuff like markers and the test name would also be dynamically set here based on the kwargs


class TestCaseDynamic:
    def __init__(klass, **kwargs):
        self.instance = klass(kwargs["some_specific_parameter"]) 
        self.kwargs = kwargs # or setattr, doesn't matter as long as they're available in the rest of the instance methods
        
    def test_a(self):
        if self.kwargs["a"]:
            assert self.instance.a == self.kwargs["a"]
        
    def test_b(self):
        if self.kwargs["b"]:
            assert self.instance.b > self.kwargs["b"]
        
    def test_c(self):
        if self.kwargs["c"]:
            assert self.instance.c <= self.kwargs["c"]

However python testing is much worse than I anticipated, and there does not seem to be an immediate way to do this. Can anyone point me in the right direction? How can I dynamically generate a huge quantity of these tests without losing my sanity in the process?

How can I make this class instance actually be an instance with self persistence? self.instance.b and self.instance.c could be calling the same expensive method internally, which is cached between instances, so why would I have to wait 5 minutes for each of these tests when in the real world it would just be called once?

Given python's dynamicity I had thought these were all rethorical questions with easy answers, but after dipping my toes in pytest I am not so sure anymore.

Every single example code I've seen is overly complicated and relies on metaclasses, nested decorators and other hard to understand code that I have to sit down and study to achieve something that is basic, obvious behavior in other languages. I found some previous answers like https://stackoverflow.com/a/35580034/7376511, but there was no way to actually dynamically call that class with the dataset without redefining it in every test file, which defeats the aim of declaring the testing scenario once and importing it everywhere else.

score 0 · Answer 1 · answered Apr 26 '23 at 06:08

I often had similar problems in the past. Especially when developing hardware-related integration tests, I often had to test devices that are very similar, but never in such a way that you can simply call the tests with different parametrization twice. Often the test logic remains the same, but the action calls are slightly different. Or even if you have multiple ways to do something (e.g. once via the old API and once via the new one), I often had to implement tests multiple times.

That was also the reason why I started to develop a new test framework called Balder. Balder uses a clear separation between what I need for a test (Scenario) and what I have (Setup). Whether a scenario (holds the test logic) matches with a setup (holds the device/class specific logic) will be determined by Balder automatically.

Of course the structure of your environment depends on your real application, but maybe this example helps you.

So for example, if you want to create your test, you create a scenario for every test:

# scenario_tests.py
import balder
from features import ClassWithAValueFeature, GetExpectedAValue

# the scenario for testing the `a` value
class ScenarioTestA(balder.Scenario):
    # you only have one device - your class
    class ClassUnderTest(balder.Device):
        # your device has one feature that provides the value `a` from the class-instance
        klass = ClassWithAValueFeature()
        # and it has one feature that provides the expected value for `a`
        expected_a = GetExpectedAValue()

    # this is the test itself
    def test_a(self):
        assert self.ClassUnderTest.klass.a_value == self.ClassUnderTest.expected_a.value

Since you only define what you really need in the scenario, your features don't really have to implement much here:

# features.py
import balder


# you only need something to access the `a` Property of your class 
# (how this is done, is not of interest on scenario-level)
class ClassWithAValueFeature(balder.Feature):
    def a_value(self):
        raise NotImplementedError()


# and you need your expected value
class GetExpectedAValue(balder.Feature):
    @property
    def value(self):
        raise NotImplementedError()

Now you also need a setup (what you have). In order for the setup to map to the scenario, the setup device must implement at least all the features of the scenario device.

import balder
import features
from myclass1 import MyClass1


# this is a helper feature which manages your class `MyClass1`
# (it is not part of your scenario, but this does not matter on setup-level)
class MyClass1InstanceFeature(balder.Feature):
    instance = None

    def create_instance(self):
        self.instance = MyClass1()


# this is the setup implementation of the scenario-level feature 
# `ClassWithAValueFeature` (subclassing it)
class MyClass1ValueAFeature(features.ClassWithAValueFeature):
    # here we reference the helper feature from above (the instance 
    # management is done in background automatically)
    instance_feature = MyClass1InstanceFeature()

    # overwrite the abstract property from the `ClassWithAValueFeature` 
    # on scenario-level
    @property
    def a_value(self):
        return self.instance_feature.instance.a


# just describe the expected value
class MyClass1ExpectedAVal(features.GetExpectedAValue):
    value = 1


# and your setup itself, it just describes which features are implemented by your device
class SetupMyClass1(balder.Setup):

    class Client1(balder.Device):
        instance = MyClass1InstanceFeature()
        a_value = MyClass1ValueAFeature()
        a = MyClass1ExpectedAVal()

    # this fixture will ensure that your instance is only created once 
    # (for all scenarios that map to this setup with `MyClass1`)
    @balder.fixture('setup')
    def create_instance(self):
        self.Client1.instance.create_instance()

When you run Balder, the scenario will be executed with your setup:

    $ balder

+----------------------------------------------------------------------------------------------------------------------+
| BALDER Testsystem                                                                                                    |
|  python version 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] | balder version 0.1.0b6                           |
+----------------------------------------------------------------------------------------------------------------------+
Collect 1 Setups and 1 Scenarios
  resolve them to 1 mapping candidates

================================================== START TESTSESSION ===================================================
SETUP SetupMyClass1
  SCENARIO ScenarioTestA
    VARIATION ScenarioTestA.ClassUnderTest:SetupMyClass1.Client1
      TEST ScenarioTestA.test_a [.]
================================================== FINISH TESTSESSION ==================================================
TOTAL NOT_RUN: 0 | TOTAL FAILURE: 0 | TOTAL ERROR: 0 | TOTAL SUCCESS: 1 | TOTAL SKIP: 0 | TOTAL COVERED_BY: 0

Now if you also want to test your second class, just add a new setup:

import balder
import features
from myclass2 import MyClass2


class MyClass2InstanceFeature(balder.Feature):
    instance = None

    def create_instance(self):
        self.instance = MyClass2()


class MyClass2ValueAFeature(features.ClassWithAValueFeature):
    instance_feature = MyClass2InstanceFeature()

    @property
    def a_value(self):
        return self.instance_feature.instance.a


class MyClass2ExpectedAVal(features.GetExpectedAValue):
    value = 12


class SetupMyClass2(balder.Setup):

    class Client2(balder.Device):
        instance = MyClass2InstanceFeature()
        a_value = MyClass2ValueAFeature()
        a = MyClass2ExpectedAVal()

    @balder.fixture('setup')
    def create_instance(self):
        self.Client2.instance.create_instance()

Now Balder will run the test with both setups:

$ balder

+----------------------------------------------------------------------------------------------------------------------+
| BALDER Testsystem                                                                                                    |
|  python version 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] | balder version 0.1.0b6                           |
+----------------------------------------------------------------------------------------------------------------------+
Collect 2 Setups and 1 Scenarios
  resolve them to 2 mapping candidates

================================================== START TESTSESSION ===================================================
SETUP SetupMyClass1
  SCENARIO ScenarioTestA
    VARIATION ScenarioTestA.ClassUnderTest:SetupMyClass1.Client1
      TEST ScenarioTestA.test_a [.]
SETUP SetupMyClass2
  SCENARIO ScenarioTestA
    VARIATION ScenarioTestA.ClassUnderTest:SetupMyClass2.Client2
      TEST ScenarioTestA.test_a [.]
================================================== FINISH TESTSESSION ==================================================
TOTAL NOT_RUN: 0 | TOTAL FAILURE: 0 | TOTAL ERROR: 0 | TOTAL SUCCESS: 2 | TOTAL SKIP: 0 | TOTAL COVERED_BY: 0

If you now also add the scenarios for your arguments b and c (and of course the setup-level features for them) then Balder will run all the tests that match your respective setups.

There is a small amount of extra work due to the separation, but especially when you want to reuse tests flexibly, it will pay of. You save a lot of time when you want to use the tests for other similar devices, because you only need to provide your specific binding code and no test logic itself. This method has helped me a lot so far, especially with larger projects.

You can also put the scenario code into a package and then send it to colleagues who want to have a similar test. They then only have to adapt their specific setup code and get the tests for free. I've already published one such package for testing SNMP (with but not so many tests yet) and two more are in progress.

I also created a repo for this example with single commits, which maybe makes it clearer, how this stuff works:

Balder can do much more. You can connect devices with each other, whereby Balder automatically checks whether the devices from a scenario (also with their connection type) match the devices from the setup. Or you can develop a feature for different peer devices, where you can even provide different implementations for different device or connection types.

You can read more about it in the documentation or in this medium post. Please be also aware that Balder is still in Beta.

Dynamically reuse integration testing scenarios in pytest

1 Answers1