Given two classes, how can I probabilistically test for equivalent behavior

Question

Let's say I have two classes which implement the same basic API, and I want to test that they are "stochastically equivalent"¹, over at least over a subset of their methods.

E.g., I write my own "list" class foo:list and rather than painstakingly writing a bunch of unit tests for it, I want to compare it to std::list as a reference. That is, any sequence of operations on foo::list should produce the same results as the same sequence of std::list.

I'm OK listing the names of the operations, but hopefully not much more boilerplate than that. A generic solution that can be applied to other pairs of "behaviorally equivalent" classes is ideal.

¹ By "stochastically equivalent" I mean that no differences are observed over many series of operations, which obviously falls short of a complete proof of equivalence.

@JonHarper - no particular framework applies to this question. To be concrete, however, I'm using Catch2 at the moment, although I'm willing to consider other frameworks or say mocking libraries if they give me this ability. — BeeOnRope, Nov 08 '19 at 18:00
Oh, excellent. I'm actually working with Catch2 as well. I'll see if I can squeeze in an answer for you. — jonspaceharper, Nov 08 '19 at 18:01
There isn't a way to do this because the possible sequences of method calls is infinite, and any sequence of any length could produce different observable behavior. You can write some tests that demonstrate within reason that the objects behave the same, but you cannot _prove_ it by testing from the outside. Any tests you write to demonstrate similar behavior would have to be tailored to the two classes being tested. There isn't a generic algorithm you can apply. — cdhowie, Nov 08 '19 at 18:03
@cdhowie - perhaps I wasn't clear, but I wasn't looking for a proof (if only I were so lucky to prove all my code correct:)), simply a reasonable exploration of the state space. Note that for some types of classes you could potentially explore it exhaustively (and know it, if you could examine the internal state in order to determine that it is has been exhaustive) - but I don't expect that even for something like "list". — BeeOnRope, Nov 08 '19 at 18:07
@BeeOnRope Even still, there isn't a "generic solution." You can randomly generate tests given method calls and domain constraints on arguments (fuzz testing), if you really wanted to take that approach, though the value of such an approach is dubious IMO. — cdhowie, Nov 08 '19 at 18:07
@cdhowie - I updated the question to make it clearer. I'd certainly hope there is a generic solution, they are pretty easy to build in other languages (lack of reflection in C++ maybe makes it more painful here, I'm not sure). Maybe you misunderstand what I mean by "generic". I mean it shouldn't be particularly tailored to any details of my "list" example. — BeeOnRope, Nov 08 '19 at 18:11
@cdhowie - eh, if you think "the value of such an approach is dubious", this probably isn't the question for you and I can only laugh at such a naive comment :). — BeeOnRope, Nov 08 '19 at 18:12
@BeeOnRope That's what makes it difficult. The tests need to be class-specific with knowledge of how they work. For example, you can't test on `list1.begin() == list2.begin()` because they will never be equal. You could test on "it takes the same number of iterations incrementing the return value of `begin()` before it equals `end()`, but then you don't have a "generic solution" anymore because you're back to writing class-specific tests. — cdhowie, Nov 08 '19 at 18:13
@cdhowie of course `list1.begin()` and `list2::begin()` are not equivalent, but `*begin(list1) == *begin(list2);` — jonspaceharper, Nov 08 '19 at 18:14
@JonHarper Which is only defined if `begin() != end()`. So, again, you need a pretty high volume of class-specific knowledge which makes the approach non-generic. — cdhowie, Nov 08 '19 at 18:15
@cdhowie - indeed, part of the challenge is defining and writing the scaffolding to understand what can be compared, shallow vs deep, value vs reference/pointer-like, etc. A good answer will cover this and introduce the requisite concepts. I don't really want to explore it with you exhaustively in the comments if you don't even see the value in the approach as it's likely going to be a waste of time for both of us, no? — BeeOnRope, Nov 08 '19 at 18:16

jonspaceharper · Answer 1 · 2019-11-09T03:23:31.217

In Short

Construct one foo::list and one std::list and then compare them as you perform operations on them. Really the only difference from a normal unit test is that you have two containers and instead of directly using REQUIRE() for each operation on the type you are testing, you perform the operation on the type you are testing and the reference type and then compare them. For this we assume that std::list or whatever is bug-free. We then use it as our reference point for not failing. In other words, if the operation succeeds with std::list and succeeds with foo::list, and they compare equal, the operation succeeded.

An Example

You know what the subset of operations are that you can use to compare state and I do not, so here's a mock comparison function

template <class T, class U>
bool compare_types(const T &t, const U &u)
{
    bool equivalent = true;
    //Generic comparisons here, like iterating over the elements to compare their values.
    //Of course update equal or just return false if the comparison fails.
    //If your types are not containers, perform whatever is needed to test equivalent state.
    return equivalent;
}

As Jarod42 pointed out, this can get more fun and more generic, particularly if the Op f below is a lambda (C++14 needed for generic lambdas):

template <class ValueT, class RefT, class TestT, class Op>
bool compare_op_with_value(RefT &t, TestT &u, Op f, const ValueT &value)
{
    if (!compare_types(t, u))
        return false;
    f(t, value);
    f(u, value);
    return compare_types(t, u);
}

Your function may return a value:

template <class ValueT, class RefT, class TestT, class Op>
bool compare_op_with_ret(RefT &t, TestT &u, Op f)
{
    if (!compare_types(t, u))
        return false;
    ValueT ret1 = f(t);
    ValueT ret2 = f(u);
    return ret1 == ret2 && compare_types(t, u);
}

...and so on for dereferenceable return types, etc. You'll need to write a new comparison function for each kind of test, but that's pretty trivial. You'll need to add another template parameter for return types that are not the same (e.g. an iterator).

Then you need your test case (I subbed in std::vector as foo::list for exposition)...

TEMPLATE_TEST_CASE("StdFooCompare", "[list]", int)
{
    using std_type = std::list<TestType>;
    using foo_type = std::vector<TestType>;

    auto initializer = {0,1,2,3,4};
    std_type list1 = initializer;
    foo_type list2 = initializer;

    //testing insertion, using auto since insert() returns iterators
    auto insert_f = [](auto list, TestType value) -> auto {
        return list.insert(list.begin(), value);
    };
    REQUIRE(compare_op_with_value(list1, list2, insert_f, -1));

    //testing front(), being explicit about the return type rather than using auto
    auto front_f = [](auto list) -> TestType & {
        return list.front();
    };
    REQUIRE(compare_op_with_ret<TestType>(list1, list2, front_f));

    //continue testing along these lines
}

~~I could spend a couple more hours on this, but I hope you get the idea.~~ I spent more time on this.

Note: I did not actually run this code, so consider it all pseudo-code to get the idea across, e.g. I may have missed a semicolon or some such.

You can even do `template bool compare_types(const Ref &ref, const T &t, Func f) { Assert_equal(ref, t); f(ref); f(t); Expect_equal(ref, t); }` to ensure same function is applied to both. — Jarod42, Nov 08 '19 at 18:49

Given two classes, how can I probabilistically test for equivalent behavior

1 Answers1

In Short

An Example