I am testing different ways of computing partial derivatives for a simple mathematical graph, and obtain a derivative > 0 when using Tensorflow 2.0 in eager execution mode, while other AD functions return a derivative of 0.
First Tensorflow 2.0 in eager execution (Python 3.7.7, Tensorflow 2.3.0):
import numpy as np
import tensorflow as tf
parm = tf.Variable(3.0)
with tf.GradientTape(persistent=True) as tape:
target = (parm * 4)**2
multpl = 2 / (target + 1)
mod_out = (target + 1) * multpl
mod_out = mod_out * 0.95 + mod_out
tape.gradient(mod_out, parm)
This returns a derivative of 1.7881393e-07. When writing the same function a little differently...
with tf.GradientTape(persistent=True) as tape:
mod_out = ((2 / ((parm * 4)**2 + 1)) * ((parm * 4)**2 + 1)) * 0.95 + ((2 / ((parm * 4)**2 + 1)) * ((parm * 4)**2 + 1))
tape.gradient(mod_out, parm)
... a derivative of 0.0 is returned (I checked that "mod_out" returns the same value as above). Now, when using "deriv()" in R (R 3.6.3), I again obtain a derivative of 0.0:
dv <- deriv(~ ((2 / ((x * 4)**2 + 1)) * ((x * 4)**2 + 1)) * 0.95 + (2 / ((x * 4)**2 + 1)) * ((x * 4)**2 + 1), 'x')
x <- 3
eval(dv)
And when using "optimize()" in R, the variable "x" is never optimized, likely because the derivative computed internally is, again, zero.
opt_func <- function(parm){
target = (parm * 4)**2
multpl = 2 / (target + 1)
mod_out = (target + 1) * multpl
mod_out = mod_out * 0.95 + mod_out
return(mod_out)
}
optimize(opt_func, interval = c(0,5))
optimize(opt_func, interval = c(0,10))
optimize(opt_func, interval = c(0,1))
To my understanding, part of this graph is non-differentiable, so it is not a big surprise that a derivative of zero is returned in most cases. What does make me wonder is why the first example returns a derivative > 0.
Any insights would be much appreciated, as I have used a structure similar to the first example in a complex optimization task with some success and would like to be able to articulate how I was able to calculate gradients.