1

I am trying to run a linear model on my data using statsmodels. My dataframe looks like the following:

              0     Group    Age    Education
3_0001    190.8      1.0      47       12
3_0002    482.1      1.0      44       16
4_0003    144.1      0.0      38       18
4_0004    205.6      0.0      51       15

The first column is the index. The second column header is a 0 with several leading spaces. There are 88 rows of data. My code is as follows:

import statsmodels.formula.api as sm

formula = "'" + list(df)[0] + " ~ " + list(df)[1] + "'"
model = sm.ols(formula, data=df).fit()

I am getting an error message that says:

Traceback (most recent call last):
  File "AUC.py", line 109, in <module>
    model = sm.ols("'"+formula+"'", data=nodeDF_clean).fit()
  File "/usr/local/lib64/python3.6/site-packages/statsmodels/base/model.py", line 169, in from_formula
    missing=missing)
  File "/usr/local/lib64/python3.6/site-packages/statsmodels/formula/formulatools.py", line 65, in handle_formula_data
    NA_action=na_action)
  File "/usr/local/lib/python3.6/site-packages/patsy/highlevel.py", line 310, in dmatrices
    NA_action, return_type)
  File "/usr/local/lib/python3.6/site-packages/patsy/highlevel.py", line 169, in _do_highlevel_design
    return_type=return_type)
  File "/usr/local/lib/python3.6/site-packages/patsy/build.py", line 893, in build_design_matrices
    rows_checker.check(value.shape[0], name, origin)
  File "/usr/local/lib/python3.6/site-packages/patsy/build.py", line 795, in check
    raise PatsyError(msg, origin)
patsy.PatsyError: Number of rows mismatch between data argument and '      0 ~ Group' (88 versus 1)
    '      0 ~ Group'
    ^^^^^^^^^^^^^^^^^

I'm using patsy 0.5.1. and python 3.6.8. I tried renaming the first column to get rid of the leading spaces. I have tried many many different iterations of the ols formula, all with the same error. What am I doing wrong? Thanks in advance.

shall
  • 11
  • 1

0 Answers0