Questions tagged [automatic-differentiation]

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Techniques include operator

  • overloading for dual numbers,
  • operator overloading to extract the operations sequence as a tape,
  • code analysis and transformation.

For a function with input of dimension n and output of dimension n, requiring L elementary operations for its evaluation, one directional derivative or one gradient can be computed with 3*L operations.

The accuracy of the derivative is, automatically, nearly as good as the accuracy of the function evaluation.

Other differentiation method are

  • symbolic differentiation, where the expanded expression for the derivatives is obtained first, which can be large depending on the implementation, and
  • numerical differentiation by divided differences, which provides less accuracy with comparable effort, or comparable accuracy with a higher effort.

See wikipedia and autodiff.org

192 questions
2
votes
0 answers

Automatic differentiation with custom data types

I'm facing a problem while trying to differentiate custom data types using the Haskell ad library. There is a related question here, which has been helpful, but I feel might be unsufficient in this case. Here is a simplified version of the issue…
tero
  • 153
  • 4
2
votes
1 answer

Change fortran compile order in NetBeans 8

I'm working in NetBeans 8 on CentOS 7 to change some old fortran code to replace numerical differentiation with automatic differentiation using OpenAD. OpenAD takes an annotated fortran function as input and generates an automatically…
ShadSterling
  • 1,792
  • 1
  • 21
  • 40
1
vote
1 answer

automatic differentiation and getting the next representable floating point value

Getting the next representable floating point number of type T greater than a given T x can be achieved by calling std::nextafter(x) or, assuming T = double, next_double_up(x), where double next_double_up(double v) { if (std::isinf(v) && v > 0) …
0xbadf00d
  • 17,405
  • 15
  • 67
  • 107
1
vote
0 answers

How to use parametric typing with structs that have many fields different types

I have a struct that has quite a few different fields. Initially, I can use parametric typing for all of the fields. For example: struct MyStruct{TF, TI, TB} a::TF b::TF c::Array{TF, 2} d::TI e::TI f::TB end Now I'm using…
1
vote
1 answer

How can I implement a vmappable sum over a dynamic range in Jax?

I want to implement something like the following Python function in Jax, and wrap it with a call to vmap. I want it to be fully reverse-mode differentiable (with respect to x) using grad(), even after the vmap. def f(x,kmax): return sum ([x**k for…
1
vote
1 answer

Confused about evaluating vector-Jacobian-product with non-identity vectors (JAX)

I'm confused about the meaning of evaluating vector-Jacobian-products when the vector used for the VJP is a non-identity row vector. My question pertains to vector-valued functions, not scalar functions like loss. I will show a concrete example…
1
vote
1 answer

How to use and interpret JAX Vector-Jacobian Product (VJP) for this example?

I am trying to learn how to find the Jacobian of a vector-valued ODE function using JAX. I am using the examples at https://implicit-layers-tutorial.org/implicit_functions/ That page implements its own ODE integrator and associated custom…
1
vote
1 answer

Using JuMP inside a Turing model

My question is: Can one use a JuMP optimization problem inside a Turing model? Below is a minimal example of the functionality that I'm trying to obtain: using Turing using JuMP import Ipopt function find_u(θ) # Describe the model model =…
1
vote
2 answers

computational complexity of higher order derivatives with AD in jax

Let f: R -> R be an infinitely differentiable function. What is the computational complexity of calculating the first n derivatives of f in Jax? Naive chain rule would suggest that each multiplication gives a factor of 2 increase, hence the nth…
1
vote
1 answer

Can tf.gradienttape() calculate gradient of other library's function

If I include inside the tf.GradientTape() some functions from other Python libraries, like `sklearn.decomposition.PCA.inverse_transform()', can TensorFlow calculate gradients from that function? Specifically, can tf automatically differetiate…
1
vote
2 answers

Trouble writing OptimizationFunction for automatic forward differentiation during Parameter Estimation of an ODEProblem

I am trying to learn Julia for its potential use in parameter estimation. I am interested in estimating kinetic parameters of chemical reactions, which usually involves optimizing reaction parameters with multiple independent batches of experiments.…
1
vote
1 answer

jax automatic differentiation

I have the following three functions implements in JAX. def helper_1(params1): ...calculations... return z def helper_2(z, params2): ...calculations... return y def main(params1, params2): z = helper_1(params1) y = helper_2(z,…
1
vote
1 answer

How to take a derivative of one of the outputs of a neural network (involving batched inputs) with respect to inputs?

I am solving a PDE using a neural network. My neural network is as follows: def f(params, inputs): for w, b in params: outputs = jnp.dot(inputs, w) + b inputs = jnn.swish(outputs) return outputs The layer architecture of the network is…
1
vote
1 answer

Why does jax.grad(lambda v: jnp.linalg.norm(v-v))(jnp.ones(2)) produce nans?

Can someone explain the following behaviour? Is it a bug? from jax import grad import jax.numpy as jnp x = jnp.ones(2) grad(lambda v: jnp.linalg.norm(v-v))(x) # returns DeviceArray([nan, nan], dtype=float32) grad(lambda v: jnp.linalg.norm(0))(x) #…
Deecer
  • 51
  • 3
1
vote
1 answer

Pytorch torch.linalg.qr is differentiable?

I have a neural network that involves the calculation of the QR decomposition of the input matrix X. Such a matrix is rectangular and it has maximal rank. My question is if this operation still allows to make the gradients propagate backward during…
Dadeslam
  • 201
  • 1
  • 8