Z3 model for correct Dafny method

Question

For a correct method, can Z3 find a model for the method's verification condition?

I had thought not, but here is an example where the method is correct

yet verification finds a model.

This was with Dafny 1.9.7.

I am not very familiar with Visual Studio's Dafny plugin, but doesn't the red dot indicate a verification failure? If so, the debugger should present a counterexample (for the failing verification condition), not a model. — Malte Schwerhoff, Oct 12 '16 at 08:13
Yes the red dot indicates verification failure. The debugger does present an example. (That's what I meant by "model".) However the example is not a counterexample, since the lemma is true. In particular Pow(2, 902) does equal Pow(2*2, 902/2). — Theodore Norvell, Oct 12 '16 at 18:06

score 4 · Accepted Answer · answered Oct 17 '16 at 23:39

What Malte says is correct (and I found it nicely explained as well).

Dafny is sound, in the sense that it will only verify correct programs. In other words, if a program is incorrect, the Dafny verifier will never say that it is correct. However, the underlying decision problems are in general undecidable. Therefore, unavoidably, there will be cases where a program meets its specifications and the verifier still gives an error message. Indeed, in such cases, the verifier may even show a purported counterexample. It may be a false counterexample (as in the example above) -- it simply means that, as far as the verifier can tell, this is a counterexample. If the verifier just spent a little more time or if it was clever enough to unroll more function definitions, apply induction hypotheses, or do a host of other good-things-to-do, it may be possible to determine that the counterexample is bogus. So, any error message you get (including any counterexample that may accompany such an error message) should be interpreted as a possible error (and possible counterexample).

Similar situations frequently occur if you're trying to verify the correctness of a loop and you don't supply a strong enough loop invariant. The Dafny verifier may then show some values of variables on entry to the loop that can never occur in actuality. The counterexample is then trying to give you an idea of how to strengthen your loop invariant appropriately.

Finally, let me add two notes to what Malte said.

First, there's at least another source of incompleteness involved in this example, namely non-linear arithmetic. It can sometimes be difficult to navigate around.

Second, the trick of using function Dummy can be simplified. It suffices (at least in this example) to mention the Pow call, for example like this:

lemma EvenPowerLemma(a: int, b: nat)
  requires Even(b)
  ensures Pow(a, b) == Pow(a*a, b/2)
{
  if b != 0 {
    var dummy := Pow(a, b - 2);
  }
}

Still, I like the other two manual proofs better, because they do a better job of explaining to the user what the proof is.

Rustan

Thanks. Z3 queries result in one of 4 results: `unsatisfiable`, `satisfiable`, `unknown`, and `time-out`. My misunderstanding was in thinking that the presence of a model implied the result from Z3 was `satisfiable`. What I didn't realize is that when the result is `unkown`, Z3 also produces a model. In fact Dafny's queries only come back as `unsatisfiable`, `unknown`, or `time-out` and a model in the BVD implies unknown. — Theodore Norvell, Oct 19 '16 at 18:44

Malte Schwerhoff · Answer 2 · 2016-10-13T14:12:02.290

Dafny fails to prove the lemma due to a combination of two possible sources of incompleteness: recursive definitions (here Pow) and induction. The proof effectively fails because of too little information, i.e. because the problem is underconstrained, which in turn explains why a counterexample can be found.

Induction

Automating induction is difficult because it requires computing an induction hypothesis, which is not always possible. However, Dafny has some heuristics for applying induction (that might or might not work), and which can be switched of, as in the following code:

lemma {:induction false} EvenPowerLemma_manual(a: int, b: nat)
  requires Even(b);
  ensures Pow(a, b) == Pow(a*a, b/2);
{
  if (b != 0) {
    EvenPowerLemma_manual(a, b - 2);
  }
}

With the heuristics switched off, you need to manually "call" the lemma, i.e. use the induction hypothesis (here, only in the case where b >= 2), in order to get the proof through.

In your case, the heuristics were activated, but they were not "good enough" to get the proof done. I'll explain why next.

Recursive definitions

Reasoning statically about recursive definitions by unfolding them is prone to infinite descent because it is in general undecidable when to stop. Hence, Dafny per default unrolls function definitions only once. In your example, unrolling the definition of Pow only once is not enough to get the induction heuristics to work because the induction hypothesis must be applied to Pow(a, b-2), which does not "appear" in the proof (since unrolling once only gets you to Pow(a, b - 1)). Explicitly mentioning Pow(a, b-2) in the proof, even in a otherwise meaningless formula, triggers the induction heuristics, however:

function Dummy(a: int): bool
{ true }

lemma EvenPowerLemma(a: int, b: nat)
  requires Even(b);
  ensures Pow(a, b) == Pow(a*a, b/2);
{
  if (b != 0) {
    assert Dummy(Pow(a, b - 2));
  }
}

The Dummy function is there to make sure that the assertion provides no information beyond syntactically including Pow(a, b-2). A less oddly-looking assertion would be assert Pow(a, b) == a * a * Pow(a, b - 2).

Calculational Proof

FYI: You can also make the proof steps explicit and have Dafny check them:

lemma {:induction false} EvenPowerLemma_manual(a: int, b: nat)
  requires Even(b);
  ensures Pow(a, b) == Pow(a*a, b/2);
{
  if (b != 0) {
    calc {
         Pow(a, b);
      == a * Pow(a, b - 1);
      == a * a * Pow(a, b - 2);
      == {EvenPowerLemma_manual(a, b - 2);}
         a * a * Pow(a*a, (b-2)/2);
      == Pow(a*a, (b-2)/2 + 1);
      == Pow(a*a, b/2);
    }
  }
}

Great answer. I learned a lot from it. The thing that surprised me was not that Dafny/Z3 couldn't prove the theorem without some code in the lemma body; actually it would have surprised me if the verification had gone though without any code in the body. The thing that I didn't understand and still don't understand is why there is a "counterexample" found that isn't actually a counterexample. Usually when the BVD gives numbers, it has actually found a counterexample i.e. an input where the code has a bug. In this case, there is no bug, so where do the numbers 902 and 2 come from? — Theodore Norvell, Oct 13 '16 at 20:00
My understanding is the following: Dafny fails to prove the property because it only unrolls `Pow` finitely often. Hence, there is an application of `Pow` about which nothing is known/assumed and whose value is effectively unconstrained. The underlying SMT solver can thus pick a model for `Pow` that does not comply with the actual function definition and that therefore may violate the lemma. The concrete number - here 902 and 2 - are heuristically determined, i.e. kind of picked at random. — Malte Schwerhoff, Oct 14 '16 at 09:49

Z3 model for correct Dafny method

2 Answers2

Linked