I have a dataset and I want to study the relationship between the age and the probability of using a credit card. So I decided to apply weighted logistic regression.
Python code:
#Imports
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
#Let's create a dataframe of our data
use = np.array([5,7,11,20,47,49,54,53,76])
total = np.array([30,33,35,40,67,61,63,59,80])
age = np.arange(16, 25)
p = use / total
df = pd.DataFrame({"Use": use, "Total": total, "Age": age, "P": p})
#Let's create our model
logit_model = smf.glm(formula = "P ~ Age", data = df,
family = sm.families.Binomial(link=sm.families.links.Logit()),
var_weights = total).fit()
logit_model.summary()
Output
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: P No. Observations: 9
Model: GLM Df Residuals: 7
Model Family: Binomial Df Model: 1
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -147.07
Date: Mon, 08 May 2023 Deviance: 1.9469
Time: 10:22:00 Pearson chi2: 1.95
No. Iterations: 5 Pseudo R-squ. (CS): 1.000
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -11.3464 1.172 -9.678 0.000 -13.644 -9.049
Age 0.5996 0.059 10.231 0.000 0.485 0.715
==============================================================================
In this model, I want to calculate the AIC.
logit_model.aic
Output
298.1385764743574
(In the above model I used the argument var_weights
as Josef suggested in this thread)
Let's do the same in R.
Rcode:
use = c(5,7,11,20,47,49,54,53,76)
total = c(30,33,35,40,67,61,63,59,80)
age = seq(16,24)
p = use/total
logit_temp = glm(p~age,family = binomial, weights = total)
logit_temp
Output
Call: glm(formula = p ~ age, family = binomial, weights = total)
Coefficients:
(Intercept) age
-11.3464 0.5996
Degrees of Freedom: 8 Total (i.e. Null); 7 Residual
Null Deviance: 156.4
Residual Deviance: 1.947 AIC: 40.12
As you can see now, the AIC of the model I created with R is very different from the AIC I found with Python. What should I change in order to have the same results?