0

Basic setup: I'm trying to run a logit regression in python on the probability of founding a business (founder variable) the exogenous variables are year, age, edu_cat (education category), and sex.

The X matrix is (4, 650), and the y matrix(1, 650). All of the variables within the x matrix have 650 non-NaN observations.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

x=np.array ([ df_all['Year'], df_all['Age'], df_all['Edu_cat'], df_all['sex']])
y= np.array([df_all['founder']])

logit_model = sm.Logit(y, x)
result = logit_model.fit()
print(result)

So I'm tracking that the shape is good, but python is telling me otherwise. Am I missing something basic?

1 Answers1

0

I believe the issue is with the Y array, being [650,1], when it should be [650,], which it defaults to. Additionally I needed to make the x array [650,4] through a transpose.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 12 '22 at 03:07