Questions tagged [automatic-differentiation]

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Techniques include operator

  • overloading for dual numbers,
  • operator overloading to extract the operations sequence as a tape,
  • code analysis and transformation.

For a function with input of dimension n and output of dimension n, requiring L elementary operations for its evaluation, one directional derivative or one gradient can be computed with 3*L operations.

The accuracy of the derivative is, automatically, nearly as good as the accuracy of the function evaluation.

Other differentiation method are

  • symbolic differentiation, where the expanded expression for the derivatives is obtained first, which can be large depending on the implementation, and
  • numerical differentiation by divided differences, which provides less accuracy with comparable effort, or comparable accuracy with a higher effort.

See wikipedia and autodiff.org

192 questions
4
votes
0 answers

Convoluted tree structure causes the GC to pause indefinitely

I am doing some machine learning self study and currently I am implementing reverse mode automatic differentiation as practice. The way the program works is by essentially overloading common expressions like multiplication, addition and so on and…
Marko Grdinić
  • 3,798
  • 3
  • 18
  • 21
4
votes
1 answer

How to do automatic differentiation on complex datatypes?

Given a very simple Matrix definition based on Vector: import Numeric.AD import qualified Data.Vector as V newtype Mat a = Mat { unMat :: V.Vector a } scale' f = Mat . V.map (*f) . unMat add' a b = Mat $ V.zipWith (+) (unMat a) (unMat b) sub' a b…
fho
  • 6,787
  • 26
  • 71
4
votes
2 answers

How does theano implement computing every function's gradient?

I have a question about Theano's implementation. How the theano get the gradient of every loss function by the following function(T.grad)? Thank you for your help. gparams = T.grad(cost, self.params)
Issac
  • 311
  • 1
  • 5
  • 10
4
votes
2 answers

Java - Computation of Derivations with Apache Commons Mathematic Library

I have a problem in using the apache commons math library. I just want to create functions like f(x) = 4x^2 + 2x and I want to compute the derivative of this function --> f'(x) = 8x + 2 I read the article about Differentiation…
4
votes
1 answer

Haskell ad package

I want to use the ad automatic differentiation package for learning neural network weights in Haskell. I have found some functions that might just have what I need, however I can't figure out what they expect as the first parameter. It must be the…
laci37
  • 510
  • 4
  • 17
3
votes
1 answer

Automatic Differentiation with respect to rank-based computations

I'm new to automatic differentiation programming, so this maybe a naive question. Below is a simplified version of what I'm trying to solve. I have two input arrays - a vector A of size N and a matrix B of shape (N, M), as well a parameter vector…
3
votes
0 answers

Why tf.GradientTape() has less GPU memory usage when watch model variables manually?

So when I use tf.GradientTape() to automatically monitor the trainable variables in a resnet model, the computer threw an out of memory error. Below is the code: x_mini = preprocess_input(x_train) with tf.GradientTape() as tape: outputs =…
3
votes
0 answers

How to obtain the Jacobian Matrix with respect to the inputs of a keras model neural network?

I recently started learning and using automatic differentiation to determine the gradients and jacobian matrix of a neural network with respect to a given input. The method suggested by tensorflow is the tape.gradient and tape.jacobian method.…
Derrick
  • 31
  • 1
3
votes
1 answer

Plotting output of ForwardDiff in Julia

I would just like to use the ForwardDiff.jl functionality to define a function and plot its gradient (evaluated using ForwardDiff.gradient). It seems not be working because the output of ForwardDiff.gradient is this weird Dual type thing, and it's…
Conor
  • 691
  • 5
  • 14
3
votes
1 answer

how to use promote rule in julia?

I'm trying to write a struct to compute the gradient (following https://www.youtube.com/watch?v=rZS2LGiurKY) this is what I have so far: struct GRAD{F <: Array{Float64,2}, ∇F <:Array{Float64,2}} f::F ∇f::∇F end begin import Base:…
3
votes
1 answer

Representing a computational graph in Haskell

I'm trying to write a simple automatic differentiation package in Haskell. What are the efficient ways to represent a type-safe (directed) computational graph in Haskell? I know that the ad package uses the "data-reify" method for that but I'm not…
3
votes
1 answer

Why do TensorFlow and PyTorch gradients of the eigenvalue decomposition differ from each other and the analytic solution?

The following code computes the eigenvalue decomposition of a real symmetric matrix. Then, the gradient of the first eigenvalue with respect to the matrix is computed. This is done three times: 1) using the analytic formula, 2) using TensorFlow, 3)…
3
votes
1 answer

Ranges with Dual Numbers

I am having an issue dealing with Dual numbers inside of ranges. Specifically: using ForwardDiff: Dual t = Dual.((0.0,10.0),0) (t[1]:1/60:t[2])[end] The issue seems to be that [end] uses last which then what's to compute the number of steps, so…
Chris Rackauckas
  • 18,645
  • 3
  • 50
  • 81
3
votes
2 answers

Automatic Differentiation with CoDiPack

The following code: #include ... codi::RealForward Gcodi[l]; for (int p = 0; p < l; p++) { ... double a = Gcodi[p]; } gives me the compilation error: nnBFAD.cpp: In function ‘void OptBF()’: nnBFAD.cpp:156:25: error: cannot…
3
votes
2 answers

Julia ReverseDiff: how to take a gradient w.r.t. only a subset of inputs?

In my data flow, I'm querying a small subset of a database, using those results to construct about a dozen arrays, and then, given some parameter values, computing a likelihood value. Then repeating for a subset of the database. I want to compute…
James
  • 630
  • 1
  • 6
  • 15
1 2
3
12 13