1

I'm having a hard time getting a regressor to work correctly, using a custom loss function. I'm currently using several datasets which contain data for transprecision computing benchmark experiments, here's a snippet from one of them:

| var_0 | var_1 | var_2 | var_3 | err_ds_0      | err_ds_1      | err_ds_2      | err_ds_3      | err_ds_4      | err_mean       | err_std           |
|-------|-------|-------|-------|---------------|---------------|---------------|---------------|---------------|----------------|-------------------|
| 27    | 45    | 35    | 40    | 16.0258634564 | 15.9905086513 | 15.9665402702 | 15.9654006879 | 15.9920739469 | 15.98807740254 | 0.02203520210917  |
| 42    | 23    | 4     | 10    | 0.82257142551 | 0.91889119458 | 0.93573069325 | 0.81276879271 | 0.87065388914 | 0.872123199038 | 0.049423964650445 |
| 7     | 52    | 45    | 4     | 2.39566262913 | 2.4233107563  | 2.45756544291 | 2.37961745294 | 2.42859839621 | 2.416950935498 | 0.027102139332226 |

(Sorry in advance for the markdown table, couldn't find a better way to do this)

Each err_ds_* column is obtained from a different benchmark execution, using the specified var_* configuration (each var contains the number of bits of precision used for a specific variable); each error cell actually contains the negative natural logarithm of the error (since the actual values are really small), and the err_mean and err_std for each row are calculated from these values.

During data preparation for the network, I reshape the dataset, in order to have each benchmark execution as a separate row (which means we're going to have multiple rows with the same var_* values, but a different error value); then I separate data (what we usually give to the fit function as x) and target (what we usually give to the fit function as y), so to obtain, respectively:

| var_0 | var_1 | var_2 | var_3 |
|-------|-------|-------|-------|
| 27    | 45    | 35    | 40    |
| 27    | 45    | 35    | 40    |
| 27    | 45    | 35    | 40    |
| 27    | 45    | 35    | 40    |
| 27    | 45    | 35    | 40    |
| 42    | 23    | 4     | 10    |
| 42    | 23    | 4     | 10    |
| 42    | 23    | 4     | 10    |
| 42    | 23    | 4     | 10    |
| 42    | 23    | 4     | 10    |
| 7     | 52    | 45    | 4     |
| 7     | 52    | 45    | 4     |
| 7     | 52    | 45    | 4     |
| 7     | 52    | 45    | 4     |
| 7     | 52    | 45    | 4     |

and

| log_err       |
|---------------|
| 16.0258634564 |
| 15.9905086513 |
| 15.9665402702 |
| 15.9654006879 |
| 15.9654006879 |
| 0.82257142551 |
| 0.91889119458 |
| 0.93573069325 |
| 0.81276879271 |
| 0.87065388914 |
| 2.39566262913 |
| 2.4233107563  |
| 2.45756544291 |
| 2.37961745294 |
| 2.42859839621 |

Finally we split again the set in order to have train data (which we're going to call train_data_regr and train_target_tensor) and test data (which we're going to call test_data_regr and test_target_tensor), all of which are scaled using scaler_regr_*.fit_transform(df) (where scaler_regr.* are StandardScaler() from sklearn.preprocessing), and fed into the network:

n_features = train_data_regr.shape
input_shape = (train_data_regr.shape[1],)

pred_model = Sequential()

# Input layer
pred_model.add(Dense(n_features * 3, activation='relu',
   activity_regularizer=regularizers.l1(1e-5), input_shape=input_shape))

# Hidden dense layers
pred_model.add(Dense(n_features * 8, activation='relu', 
   activity_regularizer=regularizers.l1(1e-5)))
pred_model.add(Dense(n_features * 4, activation='relu', 
   activity_regularizer=regularizers.l1(1e-5)))

# Output layer (two neurons, one for the mean, one for the std)
pred_model.add(Dense(2, activation='linear'))

# Loss function
def neg_log_likelihood_loss(y_true, y_pred):
    sep = y_pred.shape[1] // 2
    mu, logvar = y_pred[:, :sep], y_pred[:, sep:]
    return K.sum(0.5*(logvar+np.log(2*np.pi)+K.square((y_true-mu)/K.exp(0.5*logvar))), axis=-1)

# Callbacks
early_stopping = EarlyStopping(
        monitor='val_loss', patience=10, min_delta=1e-5) 
reduce_lr = ReduceLROnPlateau(
        monitor='val_loss', patience=5, min_lr=1e-5, factor=0.2) 
terminate_nan = TerminateOnNaN()

# Compiling
adam = optimizers.Adam(lr=0.001, decay=0.005)
pred_model.compile(optimizer=adam, loss=neg_log_likelihood_loss)

# Training
history = pred_model.fit(train_data_regr, train_target_tensor, 
        epochs=20, batch_size=64, shuffle=True, 
        validation_split=0.1, verbose=True,
        callbacks=[early_stopping, reduce_lr, terminate_nan])

predicted = pred_model.predict(test_data_regr)
actual = test_target_regr
actual_rescaled = scaler_regr_target.inverse_transform(actual)
predicted_rescaled = scaler_regr_target.inverse_transform(predicted)
test_data_rescaled = scaler_regr_data.inverse_transform(test_data_regr)

Finally the obtained data is evaluated through a custom function, which compares actual data with predicted data (namely true mean vs predicted mean and true std vs predicted std) with several metrics (like MAE and MSE), and plots the result with matplotlib.

The idea is that the two outputs of the network are going to predict the mean and the std of the error, given a var_* configuration as input.

Now, let's get the question: since with this code I'm getting very good results with the prediction of the mean (even with different benchmarks), but terrible results with the prediction of the std, I wanted to ask if this is the right way to predict the two values. I'm sure I'm missing something very basic here, but after two weeks I think I'm stuck for good.

0 Answers0