1

I'm approximating a 2D function using a neural network. I've managed to get the approximation working, but now I need to compute the first and second order partial derivatives (du/dx, du/dy, du^2/dx^2, and du^2/dy^2) for my loss function for this particular application. I'm doing it like this:

def train_neural_network_batch(x_ph, predict=False):
    prediction = neural_network_model(x_ph)

    pred_dx = tf.gradients(prediction, x1_ph)
    pred_dx2 = tf.gradients(tf.gradients(prediction, x1_ph), x1_ph)

    pred_dy = tf.gradients(prediction, x2_ph)
    pred_dy2 = tf.gradients(tf.gradients(prediction, x2_ph), x2_ph)

Assuming N training points, x_ph is shape (N**2,2) (it is the 2D input to the function), and x1_ph and x2_ph just contain the columns of x_ph, respectively. The lines that are supposed to compute the second derivatives throw errors:

File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 683, in _GradientsHelper
    gradient_uid)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 239, in _DefaultGradYs
    with _maybe_colocate_with(y.op, gradient_uid, colocate_gradients_with_ops):
AttributeError: 'NoneType' object has no attribute 'op'

The code works fine when I have a 1D function and compute the second derivatives like above FWIW. I'm assuming there's something obvious I'm missing about the data structures in the neural network that is causing the error. Anyone knows what's wrong? The following MWE works just fine btw:

# Load Modules
import tensorflow as tf
import numpy as np
import math, random
import matplotlib.pyplot as plt
from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis,title,show
from mpl_toolkits.mplot3d import Axes3D

# Create the arrays x and y that contains the inputs and the outputs of the function to approximate
N = 40
a = 0.0;
b = 2.0*np.pi;
xin = np.arange(a, b, (b-a)/N).reshape((N,1))
yin = np.arange(a, b, (b-a)/N).reshape((N,1))

X_tmp,Y_tmp = meshgrid(xin,yin)
X = np.reshape(X_tmp,(N**2,1))
Y = np.reshape(Y_tmp,(N**2,1))

# This is the exact second partial of Z = sin(x+y) with respect to x
Zxx = -np.sin(X_tmp+Y_tmp)


# Create the arrays x, y, and z that contains the inputs and the outputs of the function to approximate
x = tf.placeholder('float', [N**2,1])
y = tf.placeholder('float', [N**2,1])
z = tf.sin(x+y)

var_grad = tf.gradients(tf.gradients(z,x), x)

with tf.Session() as session:
    var_grad_val = session.run(var_grad,feed_dict={x:X, y:Y}) 
    grad1 = np.reshape(var_grad_val,(N,N))

    fig = plt.figure()
    ax = Axes3D(plt.gcf())
    surf = ax.plot_surface(X1, X2, grad1, cmap=cm.coolwarm)
    plt.show()

    fig = plt.figure()
    ax = Axes3D(plt.gcf())
    surf = ax.plot_surface(X1, X2, abs(grad1-Zxx), cmap=cm.coolwarm)
    plt.show()
user1799323
  • 649
  • 8
  • 25
  • The error that you are getting is because the gradient has become `None` at some point in the chain, meaning something was not differentiable. This can be because you used a non-differentiable operation or because `prediction` is not computed from `x1_ph` / `x2_ph` (i.e. there is no path between the tensors in the graph) - this could be the case if you got `x1_ph` and `x2_ph` from `x_ph`, and not the other way around. Please show the definition of these. Also, there is [`tf.hessians`](https://www.tensorflow.org/api_docs/python/tf/hessians) if you are interested. – jdehesa Feb 11 '19 at 11:01
  • Also similar/related question: [Compute hessian with respect to several variables in tensorflow](https://stackoverflow.com/q/54112504). – jdehesa Feb 11 '19 at 11:02

0 Answers0