I have a Keras Sequential model and I have defined an accuracy metric, which I currently hit 62% at highest, but need to hit 70% as a requirement.
The problem is the following:
Given an input of sizeNx10000
and an output of size Nx5
, achieve 70% accuracy with the following metric: a pair (y, y_pred
) is a match if |y_i - y_pred_i| <= 10 for i in (1, 5). That is, all values in y should be with no more than 10 difference to their respective values in y_i. So 70% accuracy would mean 70% of the predicted outputs should comply to that condition.
The outputs, as mentioned, have 5 features and each feature is between 0 and 100. I am currently using the 'mae' function as my loss function, because I thought that optimizing this function would also optimize the above function. I am not really sure about that. I will provide the code for my Keras model, also for the loss function.
class NeuralNetMulti(Regressor):
def __init__(self):
self.name = 'keras-sequential'
self.model = Sequential()
# self.earlystopping = callbacks.EarlyStopping(monitor="mae",
# mode="min", patience=5,
# restore_best_weights=True)
def fit(self, X, y):
print('Fitting into the neural net...')
n_inputs = X.shape[1]
n_outputs = y.shape[1]
self.model.add(Dense(1000, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
#self.model.add(Dense(100, activation='relu'))
#self.model.add(Dense(128, activation='relu'))
self.model.add(Dense(300, activation='relu'))
self.model.add(Dense(200, activation='relu'))
self.model.add(Dense(200, activation='relu'))
self.model.add(Dense(n_outputs))
self.model.summary()
self.model.compile(loss='mae', optimizer='adam', metrics=['mse', 'mae'])
self.model.fit(X, y, verbose=1, epochs=60, validation_split=0.1)
# self.model.fit(X, y, verbose=1, epochs=1000, callbacks=[self.earlystopping])
print('Fitting completed!')
def predict(self, X):
print('Predicting...')
predictions = self.model.predict(X, verbose=1)
print('Predicted!')
return predictions
And the function I want to optimize:
def N_distance(y_true, y_pred):
"""
:param y_true:
:param y_pred:
:return:
"""
score = 0
vals = abs(y_true - y_pred)
if all(a <= 10 for a in vals):
return 0
return 1
This function basically returns 0 (no error) if all features are within 10 distance, and 1 otherwise. Maybe I should modify it to return the number of features that are with difference more than 10, I don't know.
Another thing that concerns me is that my model currently returns float predictions, but my outputs really should be integers between 0 and 100. I am not sure how to approach that problem better.
More information about the data: The input data is usually around 50-60K samples, each sample having 10K features.