How to get started with testing on an existing python script?

Question

I'm fairly experienced with python as a tool for datascience, no CS background (but eager to learn).

I've inherited a 3K line python script (simulates thermal effects on a part of a machine). It was built organically by physics people used to matlab. I've cleaned it up and modularized it (put it into a class and functions). Now I want an easy way to be certain it's working correctly after someone updates it. There have been some frustrating debugging sessions lately. I figure testing of some form can help there.

My question is how do I even get started in this case of a large existing script? I see pytest and unittest but is that where I should start? The code is roughly structured like this:

class Simulator:
    parameters = input_file
    def __init__(self):
        self.fn1
        self.fn2
        self.fn3

    def fn1():
        # with nested functions
    def fn2
    def fn3
    ...
    def fn(n)

Each function either generates or acts on some data. Would a way to test to have some standardized input/output run and check against that? Is there a way to do this within the standard convention of testing?

Appreciate any advice or tips, cheers!

You should probably write tests against the unmodified code. Then test after each modification you make. Only after the code is 'cleaned up' should you begin changing what it does - and where necessary writing new tests that test for 'correctness' rather than reproducability. — match, May 01 '20 at 16:50
@match s comment is probably the answer. Anyways, if you run into problem again, a good IDE and debugger (I personally use PyCharm) can make dabugging a lot more comfortable, as you can watch variables etc in real time and pause the program at any line. — Jakob Schödl, May 01 '20 at 16:53
Test Test Test. Start with unittest, it is built-in. Write tests for functions/methods. Search pyvideo.org for testing videos made at pycons. A lot (most?0 What you learn by using unittest will transfer to other testing *regimes* if you decide to try something else. — wwii, May 01 '20 at 16:58

jfaccioni · Answer 1 · 2020-05-01T17:09:08.123

Broadly speaking, you test a function by calling it with arguments and checking if the return value is what you expect it to be. This means you should know beforehand how you expect your function to behave.

Here's a test for a simple add function:

def add(a, b):
    return a + b

def test_add_function():
    a = 1
    b = 2
    assert add(a, b) == 3  # we KNOW that adding 1 + 2 must equal 3

If you call test_add_function and no AssertionError is raised, congrats! Your test passed.

Of course, testing gets messier if you don't have "pure" functions, but rather objects that operate on shared data, like classes. Still, the logic is basically the same: call the function and check whether the expected result actually happens:

class MyClass:
    def __init__(self, a):
        self.a = a

    def add_one_to_a(self):
        self.a += 1


def test_method_add_one_to_a():
    initial_a = 1
    instance = MyClass(a=1)
    assert instance.a == initial_a  # we expect this to be 1
    instance.add_one_to_a()  # instance.a is now 2
    assert instance.a == initial_a + 1  # we expect this to be 2

I suggest reading/watching some tutorials on Python's unittest module to get your feet wet, especially getting used to the unittest.TestCase class, which helps a lot to common test operations, like set up/tear down routines (which allows you to e.g. "refresh" your Simulator instance between tests), testing if an error is raised when a function is called with wrong arguments, etc.

There are of course other strategies when things are more complicated than this (as they often are), like mocking objects which basically allows you to inspect any object/function called or modified by another object/function, check if it was called, what arguments were used, and so on.

If testing your functions is still too complex, this probably means that your code isn't modularized enough, or that your functions try to perform too many things at once.

score 2 · Answer 2 · answered May 01 '20 at 16:57

Hope everything is alright with you!

pytest is good for simple cases like yours (1 file script).

It's really simple to get started. Just install it using pip:

pip install -U pytest

Then create a test file (pytest will run all files of the form test_*.py or *_test.py in the current directory and its subdirectories)

# content of test_fn1.py

from your_script import example_function

def test_1():
    assert example_function(1, 2, 3) == 'expected output'

You can add as many tests in this file as you want, and as many test files as you desire. To run, go to the folder in a terminal and just execute pytest. For organization sake, create a folder named test with all test files inside. If you do this, pay attention to how you import your script since they won't be in the same folder anymore.

Check pytest docs for more information.

Hope this helps! Stay safe!

Thank you, I ended up taking the advice from all the really good answers, but on your suggestion I went with pytest and it's working great. — Declan, Jul 03 '20 at 09:24

JL Peyret · Accepted Answer · 2020-05-10T18:52:20.727

If you are savvy enough with Python, but struggle to adopt unittesting meaningfully to a set of scripts which take data in and transform it, I would take a different approach, at least if the output is deterministic

store sample data somewhere and keep it under source control
run individual functions against that test data.
- record the output. this is assumed to be "known good"/baseline.
  - one challenge is that you may have to "scrub" out continuously-varying data like timestamps or GUIDs.
  - sort, sort, sort. plenty of data comes out unsorted, and is good as long as all the records are correct. you cannot compare anything meaningfully under those circumstances, so you'll need to sort in a deterministic fashion.
  - file diffing typically works on a line-by-line basis. so it's best to "explode" multiple fields in 1 row into 1 field per row, possibly with a label <row1key>.f1 : <value1>\n<row1key>.f2: : <value2>
- at this point, you don't need to validate anything via this mechanism. (traditional unittesting approaches can still be used elsewhere)
whenever you modify/refactor the code, run the sample data against the relevant functions.
- compare your new output against your previous baseline. if it doesn't match you have 2 possibilities:
  - the new code produces better output, i.e. it fixes something. the new output becomes the baseline. store it in the baseline.
  - the old output is better. fix the new code till you have the same output
- if you store in text/json/yaml form, you can leverage diff-type utilities such as Winmerge, Beyond Compare (never used), diff, opendiff, etc to assist you in finding points of divergence. In fact, in the start it's often easier to ask Python to just write the output files without checking for equality and then using file diff tools to compare multiple last-run vs current-run files.

This sounds rather naive. After all, your "tests" don't really know what is going on. But it is a surprisingly powerful method to achieve stability and refactorability against an existing codebase that takes in lots of data and outputs lots of already acceptable results. This is especially true when you don't know the codebase well yet. It is less useful against a new codebase.

Note that you can always use regular pytest/unittest techniques feed your functions with more limited carefully crafted test data that exercises some particular aspect.

I've done this a number of times and it has always served me well. As you get more comfortable with the technique it takes less and less time to adapt to new circumstances and becomes more and more powerful. It is good for batch and data-transformation pipelines, not so much for GUI testing.

I have an html-oriented toolkit on github, lazy regression tests, based on this approach. Probably unsuited for a data pipeline, but you can really write your own.

score 1 · Answer 4 · answered May 01 '20 at 17:05

No matter how hard you test a program, it is always fairly reasonable to assume that there will always still be bugs left unfound, in other words, it is impossible to check for everything. To start, I recommend that you thoroughly understand how the program works; thus, you will know what the expected and important values that should be returned are, and what exceptions should be thrown when an error occurs. You will have to write the tests yourself, which may be a hassle, and it sounds as if you don't want to do it, but rigorous testing involves perseverance and determination. As you may know, debugging and fixing code can take a lot longer than the coding portion itself.

Here is the pytest documentation, I suggest you map out what you want to test first before reading the documentation. You don't need to know how pytest works before you understand how that script of yours works first. Take a pen and paper if necessary and plan out what functions do what and what exceptions should be thrown. Good luck!

score 1 · Answer 5 · edited Jun 20 '20 at 09:12

Summary: One method is to create a stand alone doctests file from tests run on your existing code through the command line interpreter.

Ideally you want a set of tests in place before you refactor the existing code. If you have inherited a big ball of mud (bbom), it can be tricky to create a set of comprehensive unit tests for each function prior to refactoring. Creating doctests can be a faster way to go.

You can quickly extend your doctests file as your understanding of the code develops and you encounter edge cases you need to include in your tests.

Please find an example for a nonesense 'little ball of mud' class (lbom) below.

import random


class Lbom():
  def __init__(self, **kwargs):
    for key, value in kwargs.items():
      if key == 'color':
        self.print_color(value) 
      if key == 'combo':
        self.combo(value)


  def combo(self, combination):
    print(combination * random.randint(0,100))


  def print_color(self, color):
    print('color: {}'.format(color))

enter the Python REPL by typing 'python' at the command line:

>>> from lbom import *
>>> random.seed(1234)
>>> test = Lbom(color='blue', combo = 3.4)
color: blue
336.59999999999997
>>> test.print_color('red')
color: red
>>> random.seed(1010)
>>> test.combo(-1)
-85
>>>

Cut and paste the tests into a file. I wrap these commands into a python module as shown below and saved it as test_lbom.py in a subdirectory called tests. The advantage of saving as a .py file and not just using doctest with a .txt file is that you can place the file in a folder separate from the file under test.

test_lbom.py:

def test_lbom():
    '''
    >>> from lbom import *
    >>> random.seed(1234)
    >>> test = Lbom(color='blue', combo = 3.4)
    color: blue
    336.59999999999997
    >>> test.print_color('red')
    color: red
    >>> random.seed(1010)
    >>> test.combo(-1)
    -85
    >>>
    '''


if __name__ == '__main__':
  import doctest
  doctest.testmod(name='test_lbom', verbose=True)

Run this using:

python -m tests.lbom_test

You will get a verbose output showing all tests are passing.

How to get started with testing on an existing python script?

5 Answers5