1

I'm getting different results from R and SAS when I try to calculate a weighted variance. Does anyone know what might be causing this difference?

I create vectors of weights and values and I then calculate the weighted variance using the Hmisc library wtd.var function:

library(Hmisc)
wt <- c(5,  5,  4,  1)
x <- c(3.7,3.3,3.5,2.8)
wtd.var(x,weights=wt)

I get an answer of:

[1] 0.0612381

But if I try to reproduce these results in SAS I get a quite different result:

data test;
  input wt x;
cards;
5 3.7
5 3.3
4 3.5
1 2.8
;
run;
proc means data=test var;
var x;
weight wt;
run;

Results in an answer of

0.2857778
Martin
  • 1,570
  • 3
  • 19
  • 32
  • 2
    Ask SAS to post its code. `Hmisc::wtd.var` is readily available. – IRTFM Jan 09 '14 at 20:11
  • 1
    SAS makes nearly all of its statistical computations available (not in code-as-in-java/c++, but in mathematical form), including the variance calculation (as answered). Asking them for source code is rather silly, unless you're going to ask Microsoft for the source code to Windows 7 and expect them to say yes? – Joe Jan 09 '14 at 21:11

1 Answers1

1

You probably have a difference in how the variance is calculated. SAS gives you an option, VARDEF, which may help here.

proc means data=test var vardef=WDF;
var x;
weight wt;
run;

That on your dataset gives a variance similar to r. Both are 'right', depending on how you choose to calculate the weighted variance. (At my shop we calculate it a third way, of course...)

Complete text from PROC MEANS documentation:

VARDEF=divisor specifies the divisor to use in the calculation of the variance and standard deviation. The following table shows the possible values for divisor and associated divisors.

Possible Values for VARDEF=
Value            Divisor                     Formula for Divisor
DF               degrees of freedom          n - 1
N                number of observations      n
WDF              sum of weights minus one    ([Sigma]iwi) - 1
WEIGHT | WGT     sum of weights              [Sigma]iwi

The procedure computes the variance as CSS/Divisor, where CSS is the corrected sums of squares and equals Sum((Xi-Xbar)^2). When you weight the analysis variables, CSS equals sum(Wi*(Xi-Xwbar)^2), where Xwbar is the weighted mean.

Default: DF Requirement: To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=.

Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of Sigma^2, where the variance of the ith observation is Sigma^2/wi and wi is the weight for the ith observation. This method yields an estimate of the variance of an observation with unit weight.

Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of Sigma^2/wbar, where wbar is the average weight. This method yields an asymptotic estimate of the variance of an observation with average weight.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thanks. Another problem in my root issue (rather than the toy example I gave) was that I was using frequency weights rather than analytic weights. I didn't realize SAS had a separate command for that. So using either vardef=WDF OR replacing weights wt with freq wt2 corrected my issue. – Martin Jan 10 '14 at 17:23