Python package that supports weighted covariance computation

Question

Is there a python statistical package that supports the computation of weighted covariance (i.e., each observation has a weight) ? Unfortuantely numpy.cov does not support weights.

Preferably working under numpy/scipy framework (i.e., able to use numpy arrays to speed up the computation).

Thanks a lot!

http://www-pcmdi.llnl.gov/svn/repository/cdat/trunk/Packages/genutil/Lib/statistics.py - try that one? — Brian Cain, Jul 12 '12 at 00:47
https://github.com/CDAT/genutil/blob/master/Lib/statistics.py — Brian Cain, May 01 '21 at 15:18

Josef · Accepted Answer · 2016-10-18T19:08:41.140

6

statsmodels has weighted covariance calculation in stats.

But we can still calculate it also directly:

# -*- coding: utf-8 -*-
"""descriptive statistic with case weights

Author: Josef Perktold
"""

import numpy as np
from statsmodels.stats.weightstats import DescrStatsW


np.random.seed(987467)
x = np.random.multivariate_normal([0, 1.], [[1., 0.5], [0.5, 1]], size=20)
weights = np.random.randint(1, 4, size=20)

xlong = np.repeat(x, weights, axis=0)

ds = DescrStatsW(x, weights=weights)

print 'cov statsmodels'
print ds.cov

self = ds  #alias to use copied expression
ds_cov = np.dot(self.weights * self.demeaned.T, self.demeaned) / self.sum_weights

print '\nddof=0'
print ds_cov
print np.cov(xlong.T, bias=1)

# calculating it directly
ds_cov0 = np.dot(self.weights * self.demeaned.T, self.demeaned) / \
              (self.sum_weights - 1)
print '\nddof=1'
print ds_cov0
print np.cov(xlong.T, bias=0)

This prints:

cov  statsmodels
[[ 0.43671986  0.06551506]
 [ 0.06551506  0.66281218]]

ddof=0
[[ 0.43671986  0.06551506]
 [ 0.06551506  0.66281218]]
[[ 0.43671986  0.06551506]
 [ 0.06551506  0.66281218]]

ddof=1
[[ 0.44821249  0.06723914]
 [ 0.06723914  0.68025461]]
[[ 0.44821249  0.06723914]
 [ 0.06723914  0.68025461]]

editorial note

The initial answer pointed out a bug in statsmodels that has been fixed in the meantime.

edited Oct 18 '16 at 19:08

answered Jul 12 '12 at 21:21

Josef

21,998
3
54
67

Looks like the statsmodels bug was [fixed in 2013](https://github.com/statsmodels/statsmodels/issues/370#issuecomment-15357376). – drevicko Sep 02 '16 at 13:36
@Mayou36 Thank you for the edit. It was already rejected by the time I saw it. I updated my answer to reflect the corrected statsmodels version – Josef Oct 18 '16 at 19:10
I think this only works if we consider the weights to be integers that represent a multiple of the respecitve observation. E.g. if the weights are small floats that sum up to 1 or smaller 1, the unbiased normalization fails to work. In other words, the weights are like `fweights` in https://numpy.org/doc/stable/reference/generated/numpy.cov.html, but NOT like `aweights` in numpy's cov. – Make42 Apr 30 '21 at 15:03
`freq_weights` do not have to be integers. The only assumption for degrees of freedom correction is that `sum_weights` corresponds to number of observations, or effective nobs. For example for robust regression full weights is 1 for an option, and observations that are outlier candidates have weights smaller than 1. e.g. M-estimator for covariance or scatter matrix. – Josef Apr 30 '21 at 15:42
statsmodels does NOT rescale weights (in contrast to Stata). So users can use different scalings (sum_weights) for different interpretation of weights. – Josef Apr 30 '21 at 15:44
However, `DescrStatsW` treats weights as `freq_weights` and only that interpretation is verified. – Josef Apr 30 '21 at 15:46

score 3 · Answer 2 · answered Sep 01 '20 at 19:23

3

Since version 1.10 numpy.cov does support weighted covariance compuation with the 'aweights' argument.

answered Sep 01 '20 at 19:23

Lukas

31
1

It also excepts `fweights` (which works like the weights of statsmodel). This should be the accepted answer nowadays. – Make42 Apr 30 '21 at 15:03

Python package that supports weighted covariance computation

2 Answers2

Linked