0

I am an absolute newbie in Python programming and currently learning basic statistics on it.

I am facing a

"PatsyError: Error evaluating factor: NameError:"

on a code with pred = model.predict(pd.DataFrame(calo['wt'])

Below is my code:

# For reading data set
# importing necessary libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

# reading a csv file using pandas library
calo=pd.read_csv("/Users/Sanjeev/Desktop/Excel R Assignments/Simple Linear Regression/calories_consumed.csv")
calo.columns = ['wt','cal']

np.corrcoef(calo.wt,calo.cal)

plt.plot(calo.wt,calo.cal,"bo");plt.xlabel("WEIGHT");plt.ylabel("CALORIES")

# For preparing linear regression model we need to import the statsmodels.formula.api
import statsmodels.formula.api as smf
model = smf.ols("wt~cal",data=calo).fit()

# For getting coefficients of the varibles used in equation
model.params

# P-values for the variables and R-squared value for prepared model
model.summary()

model.conf_int(0.05) # 95% confidence interval

pred = model.predict(pd.DataFrame(calo['wt']))

This throws up an error:

Traceback (most recent call last):

  File "<ipython-input-43-4fcbf1ee1921>", line 1, in <module>
    pred = model.predict(pd.DataFrame(calo['wt']))

  File "/anaconda3/lib/python3.7/site-packages/statsmodels/base/model.py", line 837, in predict
    exog = dmatrix(design_info, exog, return_type="dataframe")

  File "/anaconda3/lib/python3.7/site-packages/patsy/highlevel.py", line 291, in dmatrix
    NA_action, return_type)

  File "/anaconda3/lib/python3.7/site-packages/patsy/highlevel.py", line 169, in _do_highlevel_design
    return_type=return_type)

  File "/anaconda3/lib/python3.7/site-packages/patsy/build.py", line 888, in build_design_matrices
    value, is_NA = _eval_factor(factor_info, data, NA_action)

  File "/anaconda3/lib/python3.7/site-packages/patsy/build.py", line 63, in _eval_factor
    result = factor.eval(factor_info.state, data)

  File "/anaconda3/lib/python3.7/site-packages/patsy/eval.py", line 566, in eval
    data)

  File "/anaconda3/lib/python3.7/site-packages/patsy/eval.py", line 551, in _eval
    inner_namespace=inner_namespace)

  File "/anaconda3/lib/python3.7/site-packages/patsy/compat.py", line 43, in call_and_wrap_exc
    exec("raise new_exc from e")

  File "<string>", line 1, in <module>

PatsyError: Error evaluating factor: NameError: name 'cal' is not defined
    wt~cal
       ^^^

Need your help to resolve this.

Thanks in advance. :)

mortonjt
  • 650
  • 1
  • 5
  • 23
Sanjeev Raikar
  • 9
  • 1
  • 1
  • 5

3 Answers3

1

Looking at the statsmodels API here, it looks like they expect the parameters as input, rather than the covariates.

So what you probably want is

pred = model.predict(model.params)
mortonjt
  • 650
  • 1
  • 5
  • 23
  • Hi @mortonjt! I just rechecked the code and it turns out be a coding error. It input parameters are the same as mentioned by me above just that it has to change a bit - pred = model.predict(pd.DataFrame(wcat['Waist'])) – Sanjeev Raikar Mar 21 '19 at 06:52
0

you need to put a variable based on which you are going to decide dependent variable(y)

model = statsmodels.formula.api.ols('y ~x ',data=df)
model.predict(pd.DataFrame(df['x']))
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
0

I was having this problem. I was doing something like this:

for _, i in frame.iterrows()
    model.predict(i)

This doesn't provide it with the necessary headers. You have to do this:

for _, i in frame.iterrows()
    model.predict(pd.DataFrame([i]))
Dharman
  • 30,962
  • 25
  • 85
  • 135
Elliot
  • 1