Say I have a bivariate function, for example: z = x^2 + y^2. I learned that on Keras I can compute nth-order derivatives using Lambda layers:
def bivariate_function(x, y):
x2 = Lambda(lambda u: K.pow(u,2))(x)
y3 = Lambda(lambda u: K.pow(u,2))(y)
return Add()([x2,y3])
def derivative(y,x):
return Lambda(lambda u: K.gradients(u[0],u[1]))([y,x])
f = bivariate_function(x,y)
df_dx = grad(f,x) # 1st derivative wrt to x
df_dy = grad(f,y) # 1st derivative wrt to y
df_dx2 = grad(df_dx,x) # 2nd derivative wrt to x
df_dy2 = grad(df_dy,y) # 2nd derivative wrt to y
However, how do I apply this approach to the derivatives of a NN output wrt to inputs in the loss function? I can't (?) just simply feed two inputs into a dense layer (as the ones created above).
For example, trying to use as loss the sum the first derivative wrt to the first variable and the second derivative wrt to the second variable (i.e. d/dx+d²/dy²), using Input(shape=(2,))
, I managed to arrive here:
import tensorflow as tf
from keras.models import *
from keras.layers import *
from keras import backend as K
def grad(f, x):
return Lambda(lambda u: K.gradients(u[0], u[1]), output_shape=[2])([f, x])
def custom_loss(input_tensor,output_tensor):
def loss(y_true, y_pred):
df1 = grad(output_tensor,input_tensor)
df2 = grad(df1,input_tensor)
df = tf.add(df1[0,0],df2[0,1])
return df
return loss
input_tensor = Input(shape=(2,))
hidden_layer = Dense(100, activation='relu')(input_tensor)
output_tensor = Dense(1, activation='softplus')(hidden_layer)
model = Model(input_tensor, output_tensor)
model.compile(loss=custom_loss(input_tensor,output_tensor), optimizer='sgd')
xy = np.mgrid[-3.0:3.0:0.1, -3.0:3.0:0.1].reshape(2,-1).T
model.fit(x=xy,y=xy, batch_size=10, epochs=100, verbose=2)
But it just feels like I'm not doing it the proper way. Even worse, after the first epoch I'm getting just nan
's.