How TDD works when there can be millions of test cases for a production functionality?

Question

In TDD, you pick a test case and implement that test case then you write enough production code so that the test passes, refactor the codes and again you pick a new test case and the cycle continues.

The problem I have with this process is that TDD says that you write enough code only to pass the test you just wrote. What I refer to exactly is that if a method can have e.g. 1 million test cases, what can you do?! Obviously not writing 1 million test cases?!

Let me explain what I mean more clearly by the below example:

 internal static List<long> GetPrimeFactors(ulong number)
        {
            var result = new List<ulong>();

            while (number % 2 == 0)
            {
                result.Add(2);
                number = number / 2;
            }

            var divisor = 3;

            while (divisor <= number)
            {
                if (number % divisor == 0)
                {
                    result.Add(divisor);
                    number = number / divisor;
                }
                else
                {
                    divisor += 2;
                }
            }

            return result;
        }

The above code returns all the prime factors of a given number. ulong has 64 bits which means it can accept values between 0 to 18,446,744,073,709,551,615!

So, How TDD works when there can be millions of test cases for a production functionality?!

I mean how many test cases suffice to be written so that I can say I used TDD to achieve this production code?

This concept in TDD which says that you should only write enough code to pass your test seems to be wrong to me as can be seen by the example above?

When enough is enough?

My own thoughts are that I only pick some test cases e.g. for Upper band, lower band and few more e.g. 5 test cases but that's not TDD, is it?

Many thanks for your thoughts on TDD for this example.

I can't believe that you have 1,000,000 significantly-different test cases. — John Saunders, Nov 06 '11 at 19:27
To expand on @JohnSaunders's point, only a few different test cases are necessary to ensure every line of code is visited and performs its desired function. — Domenic, Nov 06 '11 at 19:32
hehe, of course like I mentioned in practice you'd write e.g. 5 test cases but my question was mainly about the sentence I mentioned on "Only write enough code to pass your test". — The Light, Nov 06 '11 at 19:32
In the book "Professional Test Driven Development": http://www.amazon.co.uk/Professional-Test-Driven-Development-Applications/dp/047064320X/ref=sr_1_1?ie=UTF8&qid=1320608745&sr=8-1. what you wrote in your answer makes quite sense, I wasn't sure; thanks. — The Light, Nov 06 '11 at 19:49
By the way, this question was here before: http://stackoverflow.com/questions/135789/tdd-when-you-can-move-on — alf, Nov 06 '11 at 20:01
just came upon this old question and here is another opinion: you may write thorough tests for the small building blocks (e.g. classes which perform calculations or other non plumbing logic) - these would be the "unit tests", and then write "integration tests", such as tests which simulate different use cases (for example, with Selenium). I think not breaking that gives considerable confidence that gross errors are not introduced in subsequent development, and narrows greatly the number of combinations to test. — John Donn, May 04 '12 at 15:43

score 31 · Answer 1 · edited Jun 20 '13 at 22:21

It's an interesting question, related to the idea of falsifiability in epistemology. With unit tests, you are not really trying to prove that the system works; you are constructing experiments which, if they fail, will prove that the system doesn't work in a way consistent with your expectations/beliefs. If your tests pass, you do not know that your system works, because you may have forgotten some edge case which is untested; what you know is that as of now, you have no reason to believe that your system is faulty.

The classical example in history of sciences is the question "are all swans white?". No matter how many different white swans you find, you can't say that the hypothesis "all swans are white" is correct. On the other hand, bring me one black swan, and I know the hypothesis is not correct.

A good TDD unit test is along these lines; if it passes, it won't tell you that everything is right, but if it fails, it tells you where your hypothesis is incorrect. In that frame, testing for every number isn't that valuable: one case should be sufficient, because if it doesn't work for that case, you know something is wrong.

Where the question is interesting though is that unlike for swans, where you can't really enumerate over every swan in the world, and all their future children and their parents, you could enumerate every single integer, which is a finite set, and verify every possible situation. Also, a program is in lots of ways closer to mathematics than to physics, and in some cases you can also truly verify whether a statement is true - but that type of verification is, in my opinion, not what TDD is going after. TDD is going after good experiments which aim at capturing possible failure cases, not at proving that something is true.

+1 While Domenic certainly hit the nail (upvoted to btw), this gave me an "aha" moment. — Lieven Keersmaekers, Nov 10 '11 at 08:15

score 19 · Answer 2 · answered Nov 06 '11 at 19:23

19

You're forgetting step three:

Red
Green
Refactor

Writing your test cases gets you to red.

Writing enough code to make those test cases pass gets you to green.

Generalizing your code to work for more than just the test cases you wrote, while still not breaking any of them, is the refactoring.

answered Nov 06 '11 at 19:23

Domenic

110,262
41
219
271

thanks for your comment. "Generalizing your code to work for more than just the test cases you wrote, while still not breaking any of them, is the refactoring." that's not exactly my definition of refactoring as I usually refer to the refactoring patterns such as http://sourcemaking.com/refactoring. What you said breaks the TDD concept of writing enough code only to pass the tests as you have written more production code than you have test for, right! – The Light Nov 06 '11 at 19:27
2

Refactoring means making changes to the code that do not change its external output. In the context of TDD, that means making changes to the code that do not change whether it passes/fails tests. And again, the TDD concept of writing enough code only to pass the tests is steps 1-2 of TDD; you are completely ignoring step 3. – Domenic Nov 06 '11 at 19:29
4

For another perspective: the concept of "code coverage" as applied to TDD is not coverage over all possible input values, but over all possible branching logic paths. If you have test cases that cover all possible branching logic paths, you have tests for all of your code, even if you don't have tests for all of your possible inputs. – Domenic Nov 06 '11 at 19:30

score 11 · Answer 3 · answered Nov 06 '11 at 23:48

You appear to be treating TDD as if it is black-box testing. It's not. If it were black-box testing, only a complete (millions of test cases) set of tests would satisfy you, because any given case might be untested, and therefore the demons in the black box would be able to get away with a cheat.

But it isn't demons in the black box in your code. It's you, in a white box. You know whether you're cheating or not. The practice of Fake It Til You Make It is closely associated with TDD, and sometimes confused with it. Yes, you write fake implementations to satisfy early test cases - but you know you're faking it. And you also know when you have stopped faking it. You know when you have a real implementation, and you've gotten there by progressive iteration and test-driving.

So your question is really misplaced. For TDD, you need to write enough test cases to drive your solution to completion and correctness; you don't need test cases for every conceivable set of inputs.

Yahia · Answer 4 · 2011-11-06T19:42:05.283

6

From my POV the refactoring step doesn't seem to have taken place on this piece of code...

In my book TDD does NOT mean to write testcases for every possible permutation of every possible input/output parameter...

BUT to write all testcases needed to ensure that it does what it is specified to be doing i.e. for such a method all boundary cases plus a test which picks randomly a number from a list containing numbers with known correct results. If need be you can always extend this list to make the test more thorough...

TDD only works in real world if you don't throw common sense out the window...

As to

Only write enough code to pass your test

in TDD this refers to "non-cheating programmers"... IF you have one or more "cheating programmer" who for example just hardcode the "correct result" of the testcases into the method I suspect you have a much bigger problem on your hands than TDD...

BTW "Testcase construction" is something you get better at the more you practice it - there is no book/guide that can tell you which testcases are best for any given situation upfront... experience pays off big when it comes to constructing testcases...

edited Nov 06 '11 at 19:42

answered Nov 06 '11 at 19:30

Yahia

69,653
9
115
144

"a test which picks randomly a number from a list containing numbers with known correct results" You shall not write a test which uses a random number. This might easily produce flickering tests which are non-deterministic. – Andre Nov 17 '11 at 12:30
@Andre generally I agree but if you specifically check this case then it is ok IMHO since we pick random numbers out of a "known list" - even picking all numbers from that list is ok. – Yahia Nov 17 '11 at 12:37
Testing all input/output pairs from a list is a completely different thing - in my opinion it is the right thing. I'm curious what makes it ok in this case to only execute one/some randomly chosen test(s)? The only reason I can think of is that the tests might take to long to run in which case I would put them into another suite of tests (that don't run so often). – Andre Nov 17 '11 at 14:15
@Andre The function we are talking about is the factorization of a number... this can't be tested to the full extent of all possible values.... so after fully testing the corner cases" it is ok IMHO test a random selection out of a list of numbers with known answers... it is no different than just making this list smaller and testing all from the smaller list. – Yahia Nov 17 '11 at 14:18
I guess we might have a misunderstanding here. From my point of view, there are two ways to interpret "test a random selection out of a list of numbers with known answers". 1. determine a random number (e.g. by throwing a dice) pick the corresponding test and **have it fixed** in your test. This means you're always running the same test. 2. have a list, call `rand()` or something similar in your code, pick the test depending on the result. That means you run a different test each time your test suite is run. Option 1 is ok, option 2 is not ok. – Andre Nov 17 '11 at 15:01
@Andre I understand what you mean and I generally agree with your POV... in this specific case I have a different POV... – Yahia Nov 17 '11 at 15:37
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/5100/discussion-between-andre-and-yahia) – Andre Nov 17 '11 at 15:49

Steve Jessop · Answer 5 · 2011-11-06T20:03:01.330

TDD does permit you to use common sense if you want to. There's no point defining your version of TDD to be stupid, just so that you can say "we're not doing TDD, we're doing something less stupid".

You can write a single test case that calls the function under test more than once, passing in different arguments. This prevents "write code to factorize 1", "write code to factorize 2", "write code to factorize 3" being separate development tasks.

How many distinct values to test really depends how much time you have to run the tests. You want to test anything that might be a corner case (so in the case of factorization at least 0, 1, 2, 3, LONG_MAX+1 since it has the most factors, whichever value has the most distinct factors, a Carmichael number, and a few perfect squares with various numbers of prime factors) plus as big a range of values as you can in the hope of covering something that you didn't realise was a corner case, but is. This may well mean writing the test, then writing the function, then adjusting the size of the range based on its observed performance.

You're also allowed to read the function specification, and implement the function as if more values are tested than actually will be. This doesn't really contradict "only implement what's tested", it just acknowledges that there isn't enough time before ship date to run all 2^64 possible inputs, and so the actual test is a representative sample of the "logical" test that you'd run if you had time. You can still code to what you want to test, rather than what you actually have time to test.

You could even test randomly-selected inputs (common as part of "fuzzing" by security analysts), if you find that your programmers (i.e. yourself) are determined to be perverse, and keep writing code that only solves the inputs tested, and no others. Obviously there are issues around the repeatability of random tests, so use a PRNG and log the seed. You see a similar thing with competition programming, online judge programs, and the like, to prevent cheating. The programmer doesn't know exactly which inputs will be tested, so must attempt to write code that solves all possible inputs. Since you can't keep secrets from yourself, random input does the same job. In real life programmers using TDD don't cheat on purpose, but might cheat accidentally because the same person writes the test and the code. Funnily enough, the tests then miss the same difficult corner cases that the code does.

The problem is even more obvious with a function that takes a string input, there are far more than 2^64 possible test values. Choosing the best ones, that is to say ones the programmer is most likely to get wrong, is at best an inexact science.

You can also let the tester cheat, moving beyond TDD. First write the test, then write the code to pass the test, then go back and write more white box tests, that (a) include values that look like they might be edge cases in the implementation actually written; and (b) include enough values to get 100% code coverage, for whatever code coverage metric you have the time and willpower to work to. The TDD part of the process is still useful, it helps write the code, but then you iterate. If any of these new tests fail you could call it "adding new requirements", in which case I suppose what you're doing is still pure TDD. But it's solely a question of what you call it, really you aren't adding new requirements, you're testing the original requirements more thoroughly than was possible before the code was written.

score 2 · Answer 6 · answered Nov 06 '11 at 19:27

When you write a test you should take meaningful cases, not every case. Meaningful cases include general cases, corner cases...

You just CAN'T write a test for every single case (otherwise you could just put the values on a table and answer them, so you'd be 100% sure your program will work :P).

Hope that helps.

alf · Answer 7 · 2011-11-06T19:52:36.120

That's sort of the first question you've got for any testing. TDD is of no importance here.

Yes, there are lots and lots of cases; moreover, there are combinations and combinations of cases if you start building the system. It will indeed lead you to a combinatoric explosion.

What to do about that is a good question. Usually, you choose equivalence classes for which your algorithm will probably work the same—and test one value for each class.

Next step would be, test boundary conditions (remember, two most frequent errors in CS are off-by one error).

Next... Well, for all practical reasons, it's ok to stop here. Still, take a look at these lecture notes: http://www.scs.stanford.edu/11au-cs240h/notes/testing.html

PS. By the way, using TDD "by book" for math problems is not a very good idea. Kent Beck in his TDD book proves that, implementing the worst possible implementation of a function calculating Fibonacci numbers. If you know a closed form—or have an article describing a proven algorithm, just make sanity checks as described above, and do not do TDD with the whole refactoring cycle—it'll save your time.

PPS. Actually, there's a nice article which (surprise!) mentions bot the Fibonacci problem and the problem you have with TDD.

"the worst possible implementation of factorial" - I hope that's repeated increment to get addition, then repeated addition to get multiplication. Presumably the point is that if the spec doesn't say how long the function takes to run, then "by the book" the tester isn't allowed to fail it on that basis. — Steve Jessop, Nov 06 '11 at 19:50
Ooops, my bad. That was a function for Fibonacci numbers, of course. — alf, Nov 06 '11 at 19:52
Just for the record: When doing TDD, you shouldn't forget the **refactor** phase, which is where you should take the "poorly implemented function" (e.g. Fibonacci), and *change the implementation **without** changing the functionality*. This means that as soon as you have a naive solution, you improve it as much as you need to make it production worthy. This is an oft ignored aspect of TDD, which tends to give it an undeserved bad rap. — Assaf Stone, Nov 07 '11 at 11:22

Merlyn Morgan-Graham · Answer 8 · 2011-11-07T12:04:54.593

I've never done any TDD, but what you're asking about isn't about TDD: It is about how to write a good test suite.

I like to design models (on paper or in my head) of all the states each piece of code can be in. I consider each line as if it were a part of a state machine. For each of those lines, I determine all the transitions that can be made (execute the next line, branch or not branch, throw an exception, overflow any of the sub calculations in the expression, etc).

From there I've got a basic matrix for my test cases. Then I determine each boundary condition for each of those state transitions, and any interesting mid-points between each of those boundaries. Then I've got the variations for my test cases.

From here I try to come up with interesting and different combinations of flow or logic - "This if statement, plus that one - with multiple items in the list", etc.

Since code is a flow, you often can't interrupt it in the middle unless it makes sense to insert a mock for an unrelated class. In those cases I've often reduced my matrix quite a bit, because there are conditions you just can't hit, or because the variation becomes less interesting by being masked out by another piece of logic.

After that, I'm about tired for the day, and go home :) And I probably have about 10-20 test cases per well-factored and reasonably short method, or 50-100 per algorithm/class. Not 10,000,000.

I probably come up with too many uninteresting test cases, but at least I usually overtest rather than undertest. I mitigate this by trying to to factor my test cases well to avoid code duplication.

Key pieces here:

Model your algorithms/objects/code, at least in your head. Your code is more of a tree than a script
Exhaustively determine all the state transitions within that model (each operation that can be executed independently, and each part of each expression that gets evaluated)
Utilize boundary testing so you don't have to come up with infinite variations
Mock when you can

And no, you don't have to write up FSM drawings, unless you have fun doing that sort of thing. I don't :)

[Flying Spaghetti Monster](http://www.venganza.org/) drawings? All hail His Noodliness! — Edmund Schweppe, Nov 07 '11 at 03:58
@Edmund: Lol. I talked about a model, state, and state transitions, and those make up a Finite State Machine. I should have linked it, tho. Will fix — Merlyn Morgan-Graham, Nov 07 '11 at 11:55

score 1 · Answer 9 · answered Nov 07 '11 at 11:26

1

There aren't millions of test cases. Only a few. You might like to try PEX, which will let you find out the different real test cases in your algorithm. Of course, you need only test those.

answered Nov 07 '11 at 11:26

Assaf Stone

6,309
1
34
43

score 0 · Answer 10 · answered Nov 06 '11 at 19:33

What you usually do, it test against "test boundary conditions", and a few random conditions.

for example: ulong.min, ulong.max, and some values. Why are you even making a GetPrimeFactors? You like to calculate them in general, or are you making that to do something specific? Test for why you're making it.

What you could also do it Assert for result.Count, instead of the all individual items. If you know how many items you're suppose to get, and some specific cases, you can still refactor your code and if those cases and the total count is the same, assume the function still works.

If you really want to test that much, you could also look into white box testing. For example Pex and Moles is pretty good.

score -1 · Answer 11 · answered Nov 09 '11 at 19:29

TDD is not a way to check that a function/program works correctly on every permutation of inputs possible. My take on it is that the probability that I write a particular test-case is proportional to how uncertain I am that my code is correct in that case.

This basically means I write tests in two scenarios: 1) some code I've written is complicated or complex and/or has too many assumptions and 2) a bug happens in production.

Once you understand what causes a bug it is generally very easy to codify in a test case. In the long term, doing this produces a robust test suite.

How TDD works when there can be millions of test cases for a production functionality?

11 Answers11

Linked