4

I am trying to fit a set of features to statsmodel's OLS linear regression model.

I am adding a few features at a time. With the first two features, it works fine. But when I keep adding new features it gives me an error.

Traceback (most recent call last):
  File "read_xml.py", line 337, in <module>
    model = sm.OLS(Y, X).fit()
...
  File "D:\pythonprojects\testproj\test_env\lib\site-packages\statsmodels\base\data.py", line 132, in _handle_constant
    if not np.isfinite(ptp_).all():
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

So I changed the type of input using

X = X.astype(float)

Then a different error pops out.

Traceback (most recent call last):
  File "read_xml.py", line 339, in <module>
    print(model.summary())
...
File "D:\pythonprojects\testproj\test_env\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1824, in sf
    place(output, (1-cond0)+np.isnan(x), self.badvalue)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

My code looks like this.

new_df0 = pd.concat([df_lex[0], summary_df[0]], axis = 0, join = 'inner')
new_df1 = pd.concat([df_lex[1], summary_df[1]], axis = 0, join = 'inner')
data = pd.concat([new_df0, new_df1], axis = 1)
print(data.shape)
X = data.values[0:6,:]
Y = data.values[6,:]
Y = Y.reshape(1,88)
X = X.T
Y = Y.T
X = X.astype(float)
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print(model.summary())

First error triggered in model = sm.OLS(Y,X).fit() Second error triggered in model.summary()

But with some other features, there are no errors.

new_df0 = pd.concat([df_len[0], summary_df[0]], axis = 0, join = 'inner')
new_df1 = pd.concat([df_len[1], summary_df[1]], axis = 0, join = 'inner')

data = pd.concat([new_df0, new_df1], axis = 1)
print(data.shape)
X = data.values[0:2,:]
Y = data.values[2,:]
Y = Y.reshape(1,88)
X = X.T
Y = Y.T
X = X.astype(float)
print(X.shape)
print(Y.shape)

model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print(model.summary())

It looks like when I have only two features it works. But when different 6 features added, it gives the errors. My major concern is to understand the error. Because I have read similar question related to plots in python. But this is triggered in the built-in functions, not in my code. Any suggestions to debug is highly appreciated.

akalanka
  • 553
  • 7
  • 21
  • 1
    One thought...what does `data.dtypes` show? It looks like something that is not an array like object is getting passed to the `np.isinstance` and/or `np.isnan` functions. – jtweeder Nov 16 '18 at 18:23
  • 1
    I found a solution when I let one of my friend to look into my code. I was only considering X as input, forgetting Y at all. Y was just 1/0. Then he proposed to set Y also to `astype(float)` and it model is working again. – akalanka Nov 18 '18 at 18:37

3 Answers3

3

Check the type of X_opt and y. Probably it's float64, because of computational precision. So, try:

X_opt = X_opt.astype(np.float64)
y = y.astype(np.float64)

I had been the same error and fixed it in this way.

stealthyninja
  • 10,343
  • 11
  • 51
  • 59
2
Y.astype(float)

did the trick.

akalanka
  • 553
  • 7
  • 21
-1

please use

model=sm.OLS(df.Y,df.X, missing='drop').fit()

It looks like there is a nan value in some variable. By default missing is none and this might be the cause.

sukhbinder
  • 1,001
  • 10
  • 9
  • It still gives me the same error `'isnan'` at `model.summary()`. So I wonder this is something related to the output of the `sm.OLS(...)` due to some of my input values are NaNs. – akalanka Nov 15 '18 at 19:04
  • Assuming I have NaNs in my input feature dataframe I used this `df_lex[i].replace([np.inf, -np.inf, np.nan], x)` to replace with `x` where `x` substituted with 0, 0.0001 (small value). Still the same error. – akalanka Nov 15 '18 at 21:49