1

I have a Keras Sequential model and I have defined an accuracy metric, which I currently hit 62% at highest, but need to hit 70% as a requirement.

The problem is the following:

Given an input of sizeNx10000 and an output of size Nx5, achieve 70% accuracy with the following metric: a pair (y, y_pred) is a match if |y_i - y_pred_i| <= 10 for i in (1, 5). That is, all values in y should be with no more than 10 difference to their respective values in y_i. So 70% accuracy would mean 70% of the predicted outputs should comply to that condition.

The outputs, as mentioned, have 5 features and each feature is between 0 and 100. I am currently using the 'mae' function as my loss function, because I thought that optimizing this function would also optimize the above function. I am not really sure about that. I will provide the code for my Keras model, also for the loss function.

class NeuralNetMulti(Regressor):
    def __init__(self):
        self.name = 'keras-sequential'
        self.model = Sequential()
        # self.earlystopping = callbacks.EarlyStopping(monitor="mae",
        #                                              mode="min", patience=5,
        #                                              restore_best_weights=True)

    def fit(self, X, y):
        print('Fitting into the neural net...')
        n_inputs = X.shape[1]
        n_outputs = y.shape[1]
        self.model.add(Dense(1000, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
        #self.model.add(Dense(100, activation='relu'))
        #self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(300, activation='relu'))
        self.model.add(Dense(200, activation='relu'))
        self.model.add(Dense(200, activation='relu'))
        self.model.add(Dense(n_outputs))
        self.model.summary()
        self.model.compile(loss='mae', optimizer='adam', metrics=['mse', 'mae'])
        self.model.fit(X, y, verbose=1, epochs=60, validation_split=0.1)
        # self.model.fit(X, y, verbose=1, epochs=1000, callbacks=[self.earlystopping])
        print('Fitting completed!')

    def predict(self, X):
        print('Predicting...')
        predictions = self.model.predict(X, verbose=1)
        print('Predicted!')
        return predictions

And the function I want to optimize:

def N_distance(y_true, y_pred):
    """

    :param y_true:
    :param y_pred:
    :return:
    """
    score = 0
    vals = abs(y_true - y_pred)
    if all(a <= 10 for a in vals):
            return 0
    return 1

This function basically returns 0 (no error) if all features are within 10 distance, and 1 otherwise. Maybe I should modify it to return the number of features that are with difference more than 10, I don't know.

Another thing that concerns me is that my model currently returns float predictions, but my outputs really should be integers between 0 and 100. I am not sure how to approach that problem better.

More information about the data: The input data is usually around 50-60K samples, each sample having 10K features.

halfer
  • 19,824
  • 17
  • 99
  • 186
Petar
  • 113
  • 1
  • 7

0 Answers0