Questions tagged [automatic-differentiation]

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Techniques include operator

  • overloading for dual numbers,
  • operator overloading to extract the operations sequence as a tape,
  • code analysis and transformation.

For a function with input of dimension n and output of dimension n, requiring L elementary operations for its evaluation, one directional derivative or one gradient can be computed with 3*L operations.

The accuracy of the derivative is, automatically, nearly as good as the accuracy of the function evaluation.

Other differentiation method are

  • symbolic differentiation, where the expanded expression for the derivatives is obtained first, which can be large depending on the implementation, and
  • numerical differentiation by divided differences, which provides less accuracy with comparable effort, or comparable accuracy with a higher effort.

See wikipedia and autodiff.org

192 questions
2
votes
0 answers

Automatic differentiation using expression templates c++

Introduction I am trying to learn about expression templates because it seems to be a very powerful technique for a wide range of calculations. I looked at different examples online (e.g. wikipedia), and I wrote a bunch of small programs that do…
2
votes
0 answers

Is there a way to define the gradient of a 3D FFT using tensorflow's custom_gradient decorator

Context & problem I am using the Hamiltonian Monte Carlo (HMC) method of the tensorflow-probability module to explore the most probable states of a self-written probability function. Amongst the parameters I am trying to fit are Fourier modes of a…
2
votes
1 answer

Tensorflow loses track of variables/gradients after multiplication with constant tensor

I have a tensorflow model with some custom tensorflow layer. I build my tf.Variables in the build() method by calling self.add_weight() as it should be done. I then multiply these weights with some other constant tensor before calling (consider it…
2
votes
1 answer

How to use of CDF function in Turing module of Julia?

I want to know the parameters of Gamma Distribution of Awareness. Awareness is 0.336 at 1 week, 0.554 at 4 week , and 0.64 at 13 week. data set is builed here. data is cdf of Gamma dist. xs = [ 1 0.336 ; 4 0.554 ; 13 0.64 ] i coded this in Julia…
2
votes
2 answers

Repeated use of GradientTape for multiple Jacobian calculations

I am attempting to compute the Jacobian of a TensorFlow neural network's outputs with respect to its inputs. This is easily achieved with the tf.GradientTape.jacobian method. The trivial example provided in the TensorFlow documentation is as…
2
votes
1 answer

Jacobian determinant of vector-valued function with Python JAX/Autograd

I have a function that maps vectors onto vectors and I want to calculate its Jacobian determinant , where the Jacobian is defined as . Since I can use numpy.linalg.det, to compute the determinant, I just need the Jacobian matrix. I know about…
Philipp Stephan
  • 372
  • 5
  • 17
2
votes
1 answer

Compute gradient of the ouputs wrt the weights

Starting from a tensorflow model, I would like to be able to retrieve the gradient of the outputs with respect to the weights. Backpropagation aims to compute the gradient of the loss wrt the weights, in order to do that somewhere in the code the…
2
votes
1 answer

why Automatic differentiation and gradient tape need to use context manager?

Context managers can change two two related operations into one.For example: with open('some_file', 'w') as opened_file: opened_file.write('Hola!') The above code is equivalent to: file = open('some_file', 'w') try: …
andy
  • 1,951
  • 5
  • 16
  • 30
2
votes
1 answer

Update step in PyTorch implementation of Newton's method

I'm trying to get some insight into how PyTorch works by implementing Newton's method for solving x = cos(x). Here's a version that works: x = Variable(DoubleTensor([1]), requires_grad=True) for i in range(5): y = x - torch.cos(x) …
2
votes
1 answer

How to implement automatic differentiation in Haskell?

So I have a Dual number class: data Dual a = !a :+ !a instance [safe] Eq a => Eq (Dual a) instance [safe] RealFloat a => Floating (Dual a) instance [safe] RealFloat a => Fractional (Dual a) instance [safe] RealFloat a => Num (Dual a) instance [safe]…
Ali Caglayan
  • 288
  • 1
  • 2
  • 15
2
votes
3 answers

Breaking TensorFlow gradient calculation into two (or more) parts

Is it possible to use TensorFlow's tf.gradients() function in parts, that is - calculate the gradient from of loss w.r.t some tensor, and of that tensor w.r.t the weight, and then multiply them to get the original gradient from the loss to the…
yoki
  • 1,796
  • 4
  • 16
  • 27
2
votes
0 answers

Haskell Linear + AD, implementing Metric for Forward?

I'm trying to use diff from the ad package on a function Quaternion a -> Quaternion a or more generally Metric a => a -> a relying on quadrance. I'm not sure what the best way to go about this is, since Forward doesn't have a Metric instance and…
2
votes
2 answers

Computational Efficiency of Forward Mode Automatic vs Numeric vs Symbolic Differentiation

I am trying to solve a problem of finding the roots of a function using the Newton-Raphson (NR) method in the C language. The functions in which I would like to find the roots are mostly polynomial functions but may also contain trigonometric and…
2
votes
1 answer

Calculate distance between n data points and k clusters in TensorFlow

X is a matrix of data points, n by d in shape. W is a matrix of cluster points, k by d in shape. The smallest distance between a datapoint, i, and each cluster can be calculated as follows: a_dist = tf.reduce_min(X[i] - W, 0); How can the distance…
Ulad Kasach
  • 11,558
  • 11
  • 61
  • 87
2
votes
0 answers

Automatic differentiation (AD) with respect to list of matrices in Haskell

I am trying to understand how can I use Numeric.AD (automatic differentiation) in Haskell. I defined a simple matrix type and a scalar function taking an array and two matrices as arguments. I want to use AD to get the gradient of the scoring…