I would like to prove in Dafny that variance(x)
(i.e., covariance(x,x)
) is >=0
. Note that we define covariance(x,y)
as: 1/N*Summation{i=1 to N}{(x_i-x_mean)*(y_i-y_mean)}
, where N
is the number of elements of x
and y
.
Let us define covariance in Dafny as in Proving a covariance inequality in Dafny, use contradiction?:
//calculates the sum of all elements of a sequence
function method sum_seq(s:seq<real>): (res:real)
ensures (forall i :: 0 <= i < |s| ==> s[i] >= 0.0) ==> res>=0.0
decreases s;
{
if s == [] then 0.0 else s[0] + sum_seq(s[1..])
}
//calculates the mean of a sequence
function method mean_fun(s:seq<real>): (res:real)
requires |s| >= 1;
decreases |s|;
ensures (forall i :: 0 <= i < |s| ==> s[i] >= 0.0) ==> res>=0.0
{
sum_seq(s) / (|s| as real)
}
//from a sequence x, it constructs a=[x[0]-x_mean, x[1]-x_mean...]
function construct_list (x:seq<real>, m:real) : (a:seq<real>)
requires |x| >= 1
ensures |x|==|a|
{
if |x| == 1 then [x[0]-m]
else [x[0]-m] + construct_list(x[1..],m)
}
//it performs the Summation of the covariance
//note that a=[x[0]-x_mean, x[1]-x_mean...] and b=[y[0]-y_mean, y[1]-y_mean...]
function {:fuel 2} product(a: seq<real>, b: seq<real>) : real
requires |a| == |b|
{
if |a| == 0 then 0.0
else a[0] * b[0] + product(a[1..], b[1..])
}
//covariance is the Summation divided by the number of elements
function cov(x: seq<real>, y: seq<real>) : (res:real)
requires |x| == |y|
requires |x| >= 1
ensures cov(x,y) == product(construct_list(x, mean_fun(x)),construct_list(y, mean_fun(y))) / (|x| as real)
//i.e., ensures cov(x,y) == product(a,b) / (|x| as real)
{
var x_mean := mean_fun(x);
var y_mean := mean_fun(y);
var a := construct_list(x, x_mean);
var b := construct_list(y, y_mean);
product(a,b) / (|a| as real)
}
Thus, to prove cov(x,y)>=0
, I directly try the following:
//variance (i.e., Cov(x,x)) is always positive
lemma covarianceItself_positive(a:seq<real>)
requires |a| >= 1
requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
ensures cov(a,a) >= 0.0
{}
Which does not verify on its own. So I verify its base case and then realize that is suffices to prove that product(a,a)>=0.0
, which is stated in Lemma productItself_positive(a)
:
lemma covarianceItself_positive(a:seq<real>)
requires |a| >= 1
requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
ensures cov(a,a) >= 0.0
{
if (|a|==1){
assert cov(a,a) >= 0.0;
}
else{
productItself_positive(a);
}
}
Where productItself_positive
is the following:
lemma productItself_positive(a:seq<real>)
requires |a| >= 1
requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
ensures product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a))) >= 0.0
{}
I am trying to make a proof for this Lemma, so I started a calculation. My problems are that (1) Dafny reports that a calculation step might not hold
; and (2) Dafny is easily getting stuck while verifying, so I do not know whether I am following the a right idea. My advances so far are as follows:
lemma productItself_positive(a:seq<real>)
requires |a| >= 1
requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
ensures product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a))) >= 0.0
{
if (|a|==1){
assert product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a))) >= 0.0;
}
else {
calc >= {
product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a)));
{
assert forall x:real :: forall y:real :: (x>=0.0 && y>=0.0) ==> (x*y>=0.0);
assert forall x:real :: forall y:real :: (x>=0.0 && y>=0.0) ==> (x+y>=0.0);
assert construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] >= 0.0;
productItself_positive(a[1..]);
//assert product(a[1..],a[1..]) >= 0.0; //Loops forever
}
//construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] + product(a[1..],a[1..]); //This should hold, but says that previous calculation does not hold
//{assume construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] + product(a[1..],a[1..]) >= 0.0;}
//construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] + product(a[1..],a[1..]) >= 0.0;
//Knowing that construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] >= 0.0 and that product(a[1..],a[1..]), their sum should be >=0.0
0.0;
}
}
}
Any help?