0

I was looking at the robust linear regression in statsmodels and I couldn't find a way to specify the "weights" of this regression. For example in least square regression assigning weights to each observation. Similar to what WLS does in statsmodels.

Or is there a way to get around it?

http://www.statsmodels.org/dev/rlm.html

CuriousMind
  • 15,168
  • 20
  • 82
  • 120

1 Answers1

0

RLM currently does not allow user specified weights. Weights are internally used to implement the reweighted least squares fitting method.

If the weights have the interpretation of variance weights to account for different variances across observations, then rescaling the data, both endog y and exog x, in analogy to WLS will produce the weighted parameter estimates.

WLS used this in the whiten method to rescale y and x

X = np.asarray(X)
if X.ndim == 1:
    return X * np.sqrt(self.weights)
elif X.ndim == 2:
    return np.sqrt(self.weights)[:, None]*X

I'm not sure whether all extra results that are available will be appropriate for the rescaled model.

Edit Followup based on comments

In WLS the equivalence W*( Y_est - Y )^2 = (sqrt(W)*Y_est - sqrt(W)*Y)^2 means that the parameter estimates are the same independent of the interpretation of weights.

In RLM we have a nonlinear objective function g((y - y_est) / sigma) for which this equivalence does not hold in general

fw * g((y - y_est) / sigma) != g((y - y_est) * sw / sigma )

where fw are frequency weights and sw are scale or variance weights and sigma is the estimated scale or standard deviation of the residual. (In general, we cannot find sw that would correspond to the fw.)

That means that in RLM we cannot use rescaling of the data to account for frequency weights.

Aside: The current development in statsmodels is to add different weight categories to GLM to develop the pattern that can be added to other models. The target is to get similar to Stata at least freq_weights, var_weights and prob_weights as options into the models.

Josef
  • 21,998
  • 3
  • 54
  • 67
  • What do you mean by "rescaling the data"? – CuriousMind Aug 06 '17 at 14:48
  • As in the code snippet from WLS, multiply the data, x and y by the square root of weights, or equivalently divide by the prior standard deviation. – Josef Aug 06 '17 at 14:50
  • Hmmm.. interesting, i did not expect simply multiplying the data would work. But I guess that is the case since we effectively converting W*( Y_est - Y )^2 to (sqrt(W)*Y_est - sqrt(W)*Y)^2 ... – CuriousMind Aug 06 '17 at 14:54
  • I added a comment about frequency weights to the answer to point out the difference between weights in WLS and possible weights in RLM. – Josef Aug 06 '17 at 15:53