18

I am new to machine learning, and I am trying to handle Keras to perform regression tasks. I have implemented this code, based on this example.

X = df[['full_sq','floor','build_year','num_room','sub_area_2','sub_area_3','state_2.0','state_3.0','state_4.0']]
y = df['price_doc']

X = np.asarray(X)
y = np.asarray(y)

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=.2)
def baseline_model():
    model = Sequential()
    model.add(Dense(13, input_dim=9, kernel_initializer='normal', 
        activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=100, verbose=False)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X_train, Y_train, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

prediction = estimator.predict(X_test)
accuracy_score(Y_test, prediction)

When I run the code I get this error:

AttributeError: 'KerasRegressor' object has no attribute 'model'

How could I correctly 'insert' the model in KerasRegressor?

user3666197
  • 1
  • 6
  • 50
  • 92
Simone
  • 4,800
  • 12
  • 30
  • 46

4 Answers4

26

you have to fit the estimator again after cross_val_score to evaluate on the new data:

estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=100, verbose=False)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X_train, Y_train, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

estimator.fit(X, y)
prediction = estimator.predict(X_test)
accuracy_score(Y_test, prediction)

Working Test version:

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score, KFold
from keras.models import Sequential
from sklearn.metrics import accuracy_score
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
seed = 1

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

def baseline_model():
    model = Sequential()
    model.add(Dense(10, input_dim=10, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model


estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=100, verbose=False)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, y, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

estimator.fit(X, y)
prediction = estimator.predict(X)
accuracy_score(y, prediction)
Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97
  • I have tried this solution before, but I get the same result: `AttributeError: 'KerasRegressor' object has no attribute 'model'` – Simone May 23 '17 at 12:34
  • do you have this line: `estimator.fit(X, y)` ? – Abhishek Thakur May 23 '17 at 12:36
  • Yes, I have added this line just above `prediction = estimator.predict(X_test)` – Simone May 23 '17 at 12:40
  • I have also added a working test version that you can just run and check. Can you please post the full stack trace of the error so that we know where the error is? – Abhishek Thakur May 23 '17 at 12:43
  • 7
    The `accuracy_score()` will not be valid for regression task. – Vivek Kumar May 23 '17 at 12:52
  • 3
    Sure it wont, I just kept it there to keep it similar to OP's question. :) – Abhishek Thakur May 23 '17 at 12:54
  • @Vivek I have just restarted the kernel, and now Abhisek's solution seems working. What could I use instead of `accuracy_score()`? – Simone May 23 '17 at 12:55
  • `` Regression ‘neg_mean_absolute_error’ metrics.mean_absolute_error ‘neg_mean_squared_error’ metrics.mean_squared_error ‘neg_median_absolute_error’ metrics.median_absolute_error ‘r2’ metrics.r2_score`` – Abhishek Thakur May 23 '17 at 12:57
  • 1
    @Simone if the answer solved your problem, please consider it accepting. – Abhishek Thakur May 23 '17 at 13:01
  • @AbhishekThakur Could you help me in finding a suitable option for substitute `accuracy_score()`? – Simone May 23 '17 at 16:26
  • 1
    @ Abhishek Thakur, I don't quite understand why estimator.fit(X,y) is needed after the training, do this command train the model another time or what? Thanks! – Changfu Li Dec 01 '17 at 01:58
  • 1
    @AbhishekThakur the estimator.fit(X, y) should be called on the x_train and y_train ? otherwise, the estimator.predict(X) will use the X as data that are laready seen during the fitting. – seralouk Jan 18 '18 at 14:01
  • @VivekKumar Does the Kfolds function actually alter estimator/any of the model or just provide a score? – Dave Jul 10 '18 at 11:08
  • 1
    @Dave KFold just provide indices of train and test data. `cross_val_score` clones the estimator, train that cloned copy on the training data from each fold and provide score from them. So it does not alter the main estimator (which is sent into it) in any way. – Vivek Kumar Jul 10 '18 at 11:12
  • @VivekKumar Thanks Vivek. So, how can we use KFolds to train our ANN model or is it's only function as a score. Would you use this or any other method of cross validation? – Dave Jul 10 '18 at 11:30
  • @ChangfuLi 's question seems pertinent to me. How does making the estimator see the same data it did during the training, help in predicting for new data? Any resolutions to this? – AkaiShuichi Dec 04 '19 at 05:32
7

For evaluation of your system performance, you can calculate the error like following. You also do not need to call KFold and cross_val_score.

import numpy as np
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score, KFold
from keras.models import Sequential
from sklearn.metrics import accuracy_score
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
seed = 1

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

def baseline_model():
    model = Sequential()
    model.add(Dense(10, input_dim=10, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model


estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=100, verbose=False)
estimator.fit(X, y)
prediction = estimator.predict(X)

train_error =  np.abs(y - prediction)
mean_error = np.mean(train_error)
min_error = np.min(train_error)
max_error = np.max(train_error)
std_error = np.std(train_error)
Noosh
  • 762
  • 7
  • 7
3

Instead of kerasRegressor, you can directly use model itself. These two snippets of the code give the exact same results:

estimator = KerasRegressor(build_fn=baseline_model)
estimator.fit(X, y, nb_epoch=100, batch_size=100, verbose=False, shuffle=False)
prediction = estimator.predict(X)


model = baseline_model()
model.fit(X, y, nb_epoch=100, batch_size=100, verbose=False, shuffle=False)
prediction = model.predict(X)

Please note that the shuffle argument of fit() function for both kerasRegressor and model needs to be False. Moreover, for having the fixed initial state and obtain reproducible results, you need to add these lines of code at the beginning of your script:

session = K.get_session()
init_op = tf.group(tf.tables_initializer(),tf.global_variables_initializer(), tf.local_variables_initializer())
session.run(init_op)
np.random.seed(1)
tf.set_random_seed(1)
Noosh
  • 762
  • 7
  • 7
0

you should train model on X_train and y_train you can not train model on X and y unless you should have extra data for testing

train should be in Train then test/predict should be on X_test.