I am designing a model with two outputs, y
and dy
, where I have much more training data for y
than dy
while the location (x) of those data points are the same (please check the image bellow).
I am handling this issue with sample_weight
in keras.model.fit
. There are two concerns:
If I pass 'zero' for a sample weight, after the first training, it results into
NaN
. I instead have to pass a very small number, which I am not sure how it affects the training.This is inefficient if I have multiple outputs with many of them have available training data at very few locations. Because, all the training data will be included in the updates. Is there any other way to handle this case?
Note that Keras works fine training the model, however, I am looking for more efficient way to also be able to pass zero
for unwanted weights.
Please check the code bellow:
import numpy as np
import keras as k
import tensorflow as tf
from matplotlib.pyplot import plot, show, legend
# Note this is needed to handle lambda layers as Keras' gradient does not work in this setup.
def custom_grad(y, x):
return tf.gradients(y, x, unconnected_gradients='zero', colocate_gradients_with_ops=True)
# Setting up keras model.
x = k.Input((1,), name='x', dtype='float32')
lay = k.layers.Dense(10, activation='tanh')(x)
lay = k.layers.Dense(10, activation='tanh')(lay)
y = k.layers.Dense(1, name='y')(lay)
dy = k.layers.Lambda(lambda f: custom_grad(f, x), name='dy')(y)
model = k.Model(x, [y, dy])
# Preparing training data.
num_samples = 10000
x_true = np.linspace(0.0, np.pi, num_samples)
y_true = np.sin(x_true)
dy_true = np.zeros_like(y_true)
# for dy, we only have values at certain points -
# say 10% of what is available for yfrom initial and the end.
percentage = 0.1
dy_ids = np.concatenate((np.arange(0, num_samples*percentage, dtype=int),
np.arange(num_samples*(1-percentage), 10000, dtype=int)))
dy_true[dy_ids] = np.cos(x_true[dy_ids])
# I use sample weight to circumvent unbalanced available data.
y_sample_weight = np.ones_like(y_true)
dy_sample_weight = np.zeros_like(y_true) + 1.0e-8
dy_sample_weight[dy_ids] = num_samples/dy_ids.size
assert abs(dy_sample_weight.sum() - num_samples) <= 1.0e-3
# training the model.
model.compile("adam", loss="mse")
model.fit(x_true, [y_true, dy_true],
sample_weight=[y_sample_weight, dy_sample_weight],
epochs=50, shuffle=True)
[y_pred, dy_pred] = model.predict(x_true)
# expected outputs.
plot(x_true, y_true, '.k', label='y true')
plot(x_true[dy_ids], dy_true[dy_ids], '.r', label='dy true')
plot(x_true, y_pred, '--b', label='y pred')
plot(x_true, dy_pred, '--b', label='dy pred')
legend()
show()