Is it bad practice to base expected results off actual results in unit testing?

Question

A co-worker was reviewing some of my unit-test code on some string generation, which kicked off a lengthy discussion. They said that the expected results should all be hard-coded and was worried that a lot of my test cases were using what was being tested to test against.

Lets say there is simple function that returns a strings with some arguments.

generate_string(name, date) #  Function to test
    result 'My Name is {name} I was born on {date} and this isn't my first rodeo'

----Test----

setUp
    name = 'John Doe'
    date = '1990-01-01'

test_that_generate_string_function
    ...
    expected = 'My Name is John Doe I was born on 1990-01-01 and this isn't my first rodeo'
    assertEquals(expected, actual)

My co-worker was instant that the expected result should always be hard-coded, as it stops there being any chance that the actual result can influence the expected result.

test_date_hardcoded_method
    ...
    date = 1990-01-01
    actual = generate_string(name, date)
    expected = 'My Name is John Doe I was born on 1990-01-01 and this isn't my first rodeo'

So if they want to make sure that the date is all up to snuff they would pass in a date value and hard-code the expected result. To me this makes sense but also seems competently redundant. The function already has a test to make sure the entire string is as expected. Any deviation from that is going to result in failed test. My method was to take the actual result, deconstruct it, hard-code something specific, and throw it back together to be used as the expected result.

test_date_deconstucted_method
    ...
    date = get_date()
    actual = generate_string(name, date)
    actual_deconstructed = actual.split(' ')
    actual_deconstructed[-7] = '1990-01-01'  # Hard code small expected change
    expected = join.actual_deconstructed
    assertEquals(expected, actual)

I ended up creating two test units using each method to see if I could understand where they were coming from but I just don't see it. When all of the expected results are hard-coded any little change makes the vast majority of the tests to fail. If "isn't" needs to be "is not" the hardcoed_method is going to fail until someone manually changes things. Whist the deconstructed_method only cares about the date and will still pass it's test. It will only fail if something unexpected happens to the date. With only a few tests failing after a change someone else has made it's really easy to pinpoint exactly what's gone wrong, which I thought was the whole point of unit testing.

I'm still within my first month of my first programming job. My co-worker is vastly more experienced than me. I have zero conviction in myself and normally just accept other people's opinions as truths, but this makes so much more sense to me. I understand their thought that having expected results informed from the actual results can be bad, but I trust all the other tests to form a web of informing tests. String formatting, token values, and formatting are all covered, as well as hard-coded tests that check for any incorrectness.

Should every test's expected results be hard-coded? Is it bad to use the actual results to inform expected results once the groundwork has already been tested?

For the title `Is it bad practice to base expected results off actual results in unit testing?` my answer would be yes it is bad practice. — Guy Coder, Apr 09 '19 at 19:47
For `My co-worker was instant that the expected result should always be hard-coded, as it stops there being any chance that the actual result can influence the expected result.` It sounds like you are paraphrasing. I don't always hard code my results, but use generators that generate both the test and the result, which does not use any of the methods or base methods in the code being tested. This is often hard because many times I have to jump through hoops to reinvent the wheel, and when I can't figure out a way, then I do hard code the result. — Guy Coder, Apr 09 '19 at 19:48
For `The function already has a test to make sure the entire string is as expected.` Programming and proofs are not the same thing. Very few programs are capable of proving something. For `I trust all the other tests to form a web of informing tests`; Trust me, I have a [bridge for sale](https://www.urbandictionary.com/define.php?term=I%20have%20a%20bridge%20to%20sell%20you). — Guy Coder, Apr 09 '19 at 19:53

score 2 · Answer 1 · answered Apr 09 '19 at 21:35

2

Your test cases should be designed with consideration to a program's requirements. If only a part of the string needs to be validated, then only validate that part of the string. If the whole string needs validation, validate the string in entirety. Passing unit tests should strongly indicate that all directly testable requirements have been observed.

If there is any chance that a bug has inserted weirdness into the pieces you aren't looking at, your method for testing will fail to catch those errors. If that is an acceptable risk, then you can choose to live with that chance, but you have to recognize the possibility and decide your own tolerance.

answered Apr 09 '19 at 21:35

rp.beltran

2,764
3
21
29

`you have to recognize the possibility and decide your own tolerance.` Try that philosophy at a bank or secure location and see how long you have a job. The company or customer sets the requirements, when in doubt, make an inquiry. – Guy Coder Apr 10 '19 at 02:29
@GuyCoder Haha, you're right . Maybe `you` don't get to decide your tolerance, but someone does. I'm used to working in a startup where I, perhaps unfortunately, get to make those decisions. One area though where I believe lesser validation can be acceptable, even for a large bank, would be in UI/UX design, where the worst case scenario is a poor render. The example given looked kinda like it may be front-end sort of work which is why I felt his needs may be somewhat relaxed. – rp.beltran Apr 10 '19 at 19:14

Dirk Herrmann · Answer 2 · 2019-04-17T20:36:49.797

You have a function that generates a string from input data. There is the option to have test cases that always test the whole generated string, although the test goal of each test is to verify a very specific part of that string. You are correct to consider this approach as bad: The resulting tests would be too broad and therefore fragile. They would fail / have to be maintained for any change, not only in case of changes that affect the specific part of the generated string. You may find it enlightening to look at Meszaros' discussion of fragile tests, in particular the part where "A test says too much about how the software should be structured or behave": http://xunitpatterns.com/Fragile%20Test.html#Overspecified%20Software

The better solution actually is to make your tests more focused, as you also want them to be. The approach you have chosen, however, is a bit weird: You take the resulting string, make a copy, patch the copy with the hand-coded expected string part that is in focus in the respective test and then compare two full strings again, the result and your patched result. Technically, you have created a test that is truly focusing only on the expected part, since the other parts of the string surrounding that will always be equal. However, this approach is confusing: To someone not fully understanding the test code it seems as if you test the code against the results from the code itself.

Why don't you do it the other way around: Take the result string, cut out the piece of interest and compare this piece against the hard-coded expectation? In your example, the test would then look like:

test_date_part_of_generated_string:
   date = 1990-01-01
   actual_full_string = generate_string(name, date)
   actual_string_parts = actual_full_string.split(' ')
   actual_date_part = actual_string_parts[-7]
   assertEquals('1990-01-01', actual_date_part)

simbo1905 · Answer 3 · 2019-04-17T22:21:04.657

At one point in time I agreed with the person who reviewed your code: make the tests brutally simple. At the same time I wanted to test every low-level part of my code to have full test coverage and to do TDD.

The problem, as you have identified, is that brutally simple tests are repetitive, when you need to change things for new scenarios, you have to change a lot of test code.

Then I was coding with someone with two decades more experience than I who I know is a world class programmer. He said “your test are too repetitive, refactor them to make them less brittle”. I said “I thought my tests need to be brutally simple and obvious and that means my code needs to be repetitive”. And he said “don’t write your test code to be any different than your production code keep them DRY (don’t repeat yourself)”.

This then brought up a whole class of meta questions about my progamming. What is enough testing code? What is good testing code?

What I eventually realised was that when I wrote a lot of brutally simple and repetitive tests was that I spent more time refactoring tests than I did writing new code. Large amount of repetitive testing code was brittle. It didn't keep bugs away it made adding features or removing tech debt harder. More code isn’t more value when it comes to business logic. And likewise more verbose test code isn't helping when refactoring it becomes ”test debt”.

This then leads to another big point: loosely typed languages, that need lots of unit tests to prove are correct, need lots of brittle and repetitive tests. Strongly typed languages, where the compiler can statically tell you about logic errors, means you have to write a less test code, that is less brittle, such that you can refactor faster. In a loosely typed language you end up writing lots of test code that makes sure at runtime you don’t pass the wrong types. In a strongly typed function language you only need to validate input at runtime: the compiler validates that your code works. So then you can write a few high level tests and be confident it all works. If you refactor your code you have less tests to refactor. You have tagged your question “language-agnostic” but the answer cannot be. The weaker your compiler the more this question is a problem: the stronger your compiler the less you have to deal with this whole issue.

I attended a four day test driven development course at a big software engineering shop that was done in Smalltalk. Why? Because no-one knows smalltalk, and it is untyped, so we had to write a test for every thing we wrote as we were all beginners in that language. It was fun, but I wouldn’t advise anyone to use a loosely typed language where they had to write a load of tests to know it worked. I would strongly advise people to use a strongly typed language where the compiler does more work, and where there can be less test code, as that is easier to refactor tests when you add new functionality. Likewise functional languages with immutable algebraic types and composition of functions need less tests as they don't have lots of mutable state to worry about. The more modern the programming language the less test code you need to write to keep bugs away.

Obviously, you cannot upgrade the language you are using at your company. So here is the one tip my friend said that sticks with me: test code should be like production code so do not repeat yourself. If you find your tests are becoming repetitive then delete tests. Keep a minimum amount of tests that will break if the logic is broken. Don't keep fifty odd tests that cover all variations of string concatenation. That is ”over-testing” Over-testing inhibits refactoring to add functionality and remove tech debt more than it keeps bugs away. In some languages, this means writing lots of repetitive tests that you need to validate your logic as you write it as scaffolding. Then when you have it working write larger tests that will break if someone breaks a subparts and delete all the repetitive tests so as not leave ”test debt”. This then results in a few coarse-grained tests that are brutally simple without a lot of repetition.

Is it bad practice to base expected results off actual results in unit testing?

3 Answers3