Tensorflow and deriv (R) generate different derivatives for identical optimization problem

Question

I am testing different ways of computing partial derivatives for a simple mathematical graph, and obtain a derivative > 0 when using Tensorflow 2.0 in eager execution mode, while other AD functions return a derivative of 0.

First Tensorflow 2.0 in eager execution (Python 3.7.7, Tensorflow 2.3.0):

import numpy as np
import tensorflow as tf

parm = tf.Variable(3.0)

with tf.GradientTape(persistent=True) as tape:
    target = (parm * 4)**2
    
    multpl = 2 / (target + 1)
    mod_out = (target + 1) * multpl
    mod_out = mod_out * 0.95 + mod_out
    
tape.gradient(mod_out, parm)

This returns a derivative of 1.7881393e-07. When writing the same function a little differently...

with tf.GradientTape(persistent=True) as tape:
    mod_out = ((2 / ((parm * 4)**2 + 1)) * ((parm * 4)**2 + 1)) * 0.95 + ((2 / ((parm * 4)**2 + 1)) * ((parm * 4)**2 + 1))
    
tape.gradient(mod_out, parm)

... a derivative of 0.0 is returned (I checked that "mod_out" returns the same value as above). Now, when using "deriv()" in R (R 3.6.3), I again obtain a derivative of 0.0:

dv <- deriv(~ ((2 / ((x * 4)**2 + 1)) * ((x * 4)**2 + 1)) * 0.95 + (2 / ((x * 4)**2 + 1)) * ((x * 4)**2 + 1), 'x')

x <- 3

eval(dv)

And when using "optimize()" in R, the variable "x" is never optimized, likely because the derivative computed internally is, again, zero.

opt_func <- function(parm){
  target = (parm * 4)**2
  
  multpl = 2 / (target + 1)
  mod_out = (target + 1) * multpl
  mod_out = mod_out * 0.95 + mod_out
  
  return(mod_out)
}

optimize(opt_func, interval = c(0,5))
optimize(opt_func, interval = c(0,10))
optimize(opt_func, interval = c(0,1))

To my understanding, part of this graph is non-differentiable, so it is not a big surprise that a derivative of zero is returned in most cases. What does make me wonder is why the first example returns a derivative > 0.

Any insights would be much appreciated, as I have used a structure similar to the first example in a complex optimization task with some success and would like to be able to articulate how I was able to calculate gradients.

Tensorflow and deriv (R) generate different derivatives for identical optimization problem

0 Answers0