I'm approximating a 2D function using a neural network. I've managed to get the approximation working, but now I need to compute the first and second order partial derivatives (du/dx, du/dy, du^2/dx^2, and du^2/dy^2) for my loss function for this particular application. I'm doing it like this:
def train_neural_network_batch(x_ph, predict=False):
prediction = neural_network_model(x_ph)
pred_dx = tf.gradients(prediction, x1_ph)
pred_dx2 = tf.gradients(tf.gradients(prediction, x1_ph), x1_ph)
pred_dy = tf.gradients(prediction, x2_ph)
pred_dy2 = tf.gradients(tf.gradients(prediction, x2_ph), x2_ph)
Assuming N training points, x_ph
is shape (N**2,2)
(it is the 2D input to the function), and x1_ph
and x2_ph
just contain the columns of x_ph
, respectively. The lines that are supposed to compute the second derivatives throw errors:
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 683, in _GradientsHelper
gradient_uid)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 239, in _DefaultGradYs
with _maybe_colocate_with(y.op, gradient_uid, colocate_gradients_with_ops):
AttributeError: 'NoneType' object has no attribute 'op'
The code works fine when I have a 1D function and compute the second derivatives like above FWIW. I'm assuming there's something obvious I'm missing about the data structures in the neural network that is causing the error. Anyone knows what's wrong? The following MWE works just fine btw:
# Load Modules
import tensorflow as tf
import numpy as np
import math, random
import matplotlib.pyplot as plt
from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis,title,show
from mpl_toolkits.mplot3d import Axes3D
# Create the arrays x and y that contains the inputs and the outputs of the function to approximate
N = 40
a = 0.0;
b = 2.0*np.pi;
xin = np.arange(a, b, (b-a)/N).reshape((N,1))
yin = np.arange(a, b, (b-a)/N).reshape((N,1))
X_tmp,Y_tmp = meshgrid(xin,yin)
X = np.reshape(X_tmp,(N**2,1))
Y = np.reshape(Y_tmp,(N**2,1))
# This is the exact second partial of Z = sin(x+y) with respect to x
Zxx = -np.sin(X_tmp+Y_tmp)
# Create the arrays x, y, and z that contains the inputs and the outputs of the function to approximate
x = tf.placeholder('float', [N**2,1])
y = tf.placeholder('float', [N**2,1])
z = tf.sin(x+y)
var_grad = tf.gradients(tf.gradients(z,x), x)
with tf.Session() as session:
var_grad_val = session.run(var_grad,feed_dict={x:X, y:Y})
grad1 = np.reshape(var_grad_val,(N,N))
fig = plt.figure()
ax = Axes3D(plt.gcf())
surf = ax.plot_surface(X1, X2, grad1, cmap=cm.coolwarm)
plt.show()
fig = plt.figure()
ax = Axes3D(plt.gcf())
surf = ax.plot_surface(X1, X2, abs(grad1-Zxx), cmap=cm.coolwarm)
plt.show()