AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS' in spyder

Question

import statsmodels.formula.api as sm
X = np.append(arr = np.ones((50, 1)).astype(int) , values = X, axis = 1)
X_opt = X[:, [0 ,1, 2, 3, 4, 5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

Traceback (most recent call last):

File "", line 1, in regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()

AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS'

Does this answer your question? [AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS'](https://stackoverflow.com/questions/56449787/attributeerror-module-statsmodels-formula-api-has-no-attribute-ols) — Tomerikoo, Nov 09 '21 at 22:05

Croves · Answer 1 · 2019-11-21T13:47:45.610

0

In Python, function names are case-sensitive. Note that you are calling a function OLS (all capitalized), while the correct way is all lowercase.

Please be aware that in statsmodels package there are two OLS modules:

statsmodels.regression.linear_model.OLS

statsmodels.formula.api.ols

You are importing the formula API but applying the linear model function. You need to understand which one you want.

Please refeer to these link for more information: https://www.statsmodels.org/dev/generated/statsmodels.formula.api.ols.html#statsmodels.formula.api.ols https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html#statsmodels.regression.linear_model.OLS

edited Nov 21 '19 at 13:47

answered Nov 20 '19 at 18:50

Croves

400
3
11

```import statsmodels.formula.api as sm X = np.append(arr = np.ones((50, 1)).astype(int) , values = X, axis = 1) X_opt = X[:, [0 ,1, 2, 3, 4, 5]] regressor_OLS = sm.ols(endog = y, exog = X_opt).fit() regressor_OLS.summary() Traceback (most recent call last): File "", line 4, in regressor_OLS = sm.ols(endog = y, exog = X_opt).fit() TypeError: from_formula() missing 2 required positional arguments: 'formula' and 'data'``` – Mohamed Magdy Nov 20 '19 at 19:30
check the documentation for the difference between `OLS` and `ols`. The latter requires a formula – Josef Nov 20 '19 at 21:38
1

the answer here is incorrect because `ols` has a different signature and is a shortcut to the `from_formula` method of the model classes – Josef Nov 21 '19 at 13:29
@Josef You are absolutelly correct. Thank you for pointing this, I made some changes in my answer. – Croves Nov 21 '19 at 13:48

score 0 · Answer 2 · edited Apr 23 '20 at 13:25

Try this:

    import statsmodels.api as sm
    x = np.append(arr=np.ones((50,1)).astype(int),values=x,axis=1)
    x_opt=x[:,[0,1,2,3,4,5]]
    regressor_ols=sm.OLS(endog=y,exog=x_opt).fit()

This will work because statsmodels.api contain Ordinary least squares(OLS) for more information check out this link https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html

score 0 · Answer 3 · answered Nov 09 '21 at 21:46

0

You are calling all caps OLS() instead of using lowercase ols() method that actually exists in that package.

answered Nov 09 '21 at 21:46

GreatDuke

214
1
10

score 0 · Answer 4 · edited Aug 16 '22 at 04:52

Here is the complete code. No need to change any, just paste and run

# Multiple regression
#data preprocessing
#data about 50 companies about their expenses and their profits

# 5 methods of building models
    # 1 All-in (means through all variables )
    # Backward Elimination  ---------- (stepwise regression)
    # Forward Selection ---------- (stepwise regression)
    # Bidirectional Elimination ---------- (stepwise regression)
    # Score Comparison

import numpy as np
import pandas as pd

# Loading data set
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

# Encoding Categorical Data
#from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#labelencoder = LabelEncoder()
#X[:, 3] = labelencoder.fit_transform(X[:, 3])
#onehotencoder = OneHotEncoder(categorical_features = [3])
#X = onehotencoder.fit_transform(X).toarray()
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(
    [('one_hot_encoder', OneHotEncoder(categories='auto'), [3])],   # The column numbers to be transformed (here is [0] but can be [0, 1, 3])
    remainder='passthrough'                                         # Leave the rest of the columns untouched
)
X = ct.fit_transform(X)

# Avoiding the Dummy Variable Trap
X = X[:, 1:] # it will less or delete One dummy variable i.e from 3 to 2 dummy variables

# Splitting the dataset into the Training set and Test set i.e Train_set = 40 and Test_set = 10
#from sklearn.cross_validation import train_test_split
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size = 0.2, #20%  of whole dataset will be test dataset
                                                    random_state = 0)
# Fitting Multiple Linear REgression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)


#Building the optimal model using Backward Elimination (Based on statistical significant)
# Backword Elimination
    # 1. Select a significance level to stay in the model (e.g SL = 0.05)
    # 2. Fit the full model with all possible predictors
    # 3. Consider the predictor with the highest P-values. 
        #if P>SL, go to STEP 4, otherwise go to Finish
    # 4. REmove the predictor
    # 5. Fit model without this variable *

#--------------------------------------------------------------#

# 1. Select a significance level to stay in the model (e.g SL = 0.05)
#import statsmodels.formula.api as sm #by course
import statsmodels.regression.linear_model as sm
# X = np.append(arr = X, 
#              values = np.ones((50, 1)).astype(int), #ccolumn of 50 rows and 1 column
#              axis = 1) # this will append column with values 1 at end of X 
# to bring at 1st exchange values
X = np.append(arr = np.ones((50, 1)).astype(int), 
              values = X,
              axis = 1)
#X_opt = X[:, [0,1,2,3,4,5]] #by course
X_opt = np.array(X[:, [0, 1, 2, 3, 4, 5]], dtype=float)

#--------------------------------------------------------------#

# 2. Fit the full model with all possible predictors
# OLS stands for OrdinaryLeastSquares
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()

#--------------------------------------------------------------#

# 3. Consider the predictor with the highest P-values. 
    #if P>SL, go to STEP 4, otherwise go to Finish
print(regressor_OLS.summary())

#--------------------------------------------------------------#

# 4. Remove the predictor
# the 3rd dummy variable which is at index 2 has highest P value and greater than 0.05
# Remove index 2 column
X_opt = np.array(X[:, [0, 1, 3, 4, 5]], dtype=float)

#--------------------------------------------------------------#

# 5. Fit model without this variable *
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()

#--------------------------------------------------------------#

# Repeat step 3 and so on
print(regressor_OLS.summary())
# the 2nd dummy variable which is at index 1 has highest P value and greater than 0.05
# Remove index 1 column
X_opt = np.array(X[:, [0, 3, 4, 5]], dtype=float)
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
print(regressor_OLS.summary())

# the administration variable which is at index 2 has highest P value and greater than 0.05
# Remove index 2 column
X_opt = np.array(X[:, [0, 3, 5]], dtype=float)
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
print(regressor_OLS.summary())


# the marketing variable which is at index 2 has highest P value and greater than 0.05
# Remove index 2 column
X_opt = np.array(X[:, [0, 3]], dtype=float)
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
print(regressor_OLS.summary())

# the R&D is the only powerful predictor to predict the profit in true sense because its P value is very very small and has greater significance

AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS' in spyder

4 Answers4