1

In R, we below code for weighted GLM:

glm(formula, weight)

R Documentation: an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector

In Python, using statsmodel.formula.api:

smf.glm(formula, data, freq_weight)

Python Documentation: 1d array of frequency weights. The default is None. If None is selected or a blank value, then the algorithm will replace with an array of 1’s with length equal to the endog.

Is the "weight" in R same as "freq_weight" in Python? (I am getting different Beta estimates in Python and R. They are close but slightly different)

Ussu20
  • 129
  • 1
  • 12
  • What do the respective documentations say about the `weight` parameters? – Roland Mar 05 '21 at 11:54
  • Added the documentation details in the question. It's not very clear. – Ussu20 Mar 05 '21 at 14:20
  • 1
    As far as I remember, R glm weights are `var_weights` not `freq_weights`. statsmodels GLM has both. In some cases both kinds of weights produce the same results, but not for all family link combinations and standard errors can differ in general. – Josef Mar 05 '21 at 17:04
  • Thanks @Josef. Pls share any link/material which shows mathematically difference of using the two weights – Ussu20 Mar 05 '21 at 17:11
  • Also, in R, we have `residual(glm, type)` to get the series of residual. Is there any such option in Python? (I could not find anything that simple in Python) – Ussu20 Mar 05 '21 at 17:14
  • 1
    I answered https://stackoverflow.com/questions/66493682/glm-residual-in-python-statsmodel/66496779#66496779 for residuals – Josef Mar 05 '21 at 17:22

2 Answers2

2

As far as I remember, R glm weights are var_weights not freq_weights.

statsmodels GLM has both. In some cases both kinds of weights produce the same results, but not for all family link combinations and standard errors can differ in general.

This notebook illustrates some of the differences https://www.statsmodels.org/stable/examples/notebooks/generated/glm_weights.html

var_weights are often used when the outcome variable represents an average of several observations and the variance depends on the number of observations that have been used in the average.

freq_weights are mainly a short cut if we have several identical observations. For example, if we only have categorical explanatory variables, then freq_weights can be use for the counts of unique observations.

Josef
  • 21,998
  • 3
  • 54
  • 67
0

I haven't worked with Python but it might have to do with Python and R using different types of sums of squares for the model by default. Here is an overview of the different types for R: http://www.dwoll.de/r/ssTypes.php

Ju Ko
  • 466
  • 7
  • 22