1

For quite some time I have been wondering how automatic differentiation works. However, I am a bit confused on how the forward mode works -- I am not equipped to deal with reverse mode at the moment. I have tried to read the source code of some libraries (mainly autodiff) and read some papers (e.g. FAD) in order to understand how people are doing it, with little success.

My main issue is I don't get how dual numbers are used. For example, let's say we define a class of dual numbers (in C++) that holds two numbers; value and derivative. Then, we can overload different mathematical functions and operators, in order to define the dual number algebra (as in the complex number case). Then, and this is my problem, no matter we do, we are only going to get first derivatives.

I keep reading about implementation of hyper-dual numbers, which are described as duals that store values, Jacobian, Hessian, etc. If this is true, then if I have a function of 15 variables and I need the third derivative wrt all of them, my computer is going to blow up... Since there are very efficient libraries out there that do such calculations, I am clearly missing something.

I don't have a specific coding question, I would appreciate any input on how forward mode autodiff can be implemented in a practical way.

More info

I have written a basic dual number library in C++, which you can find on github. However, once I finished writing the class and a few function overloads, I gave up due to the problem I describe above (DualNumbers.cpp has several examples, thogouh).

Recently I also started again, this time using expression templates (because I wanted to learn how to use them) -- see github, but this approach has another issue I describe in another question.

dkaramit
  • 125
  • 7
  • 1
    Second derivatives are efficiently computed by combining a forward and a backward sweep. Univariate higher derivatives are computed by extending the dual number concept to higher-degree truncated Taylor series. Multivariate higher derivatives can be computed by combining Taylor expansions in different directions but that runs fast into numerical stability issues. A direct manipulation of the derivative tensors is possible but a logistical nightmare. – Lutz Lehmann Jun 19 '21 at 05:52
  • This is interesting. So, is there any resource that explains how truncated Taylor series can be implemented. – dkaramit Jun 20 '21 at 01:29
  • For the Taylor arithmetic, the usual resources about automatic differentiation, foremost Griewank et al. Then FADBAD++ with the TADIFF module, it has well-documented use-cases. For the computation of higher derivatives, most information is probably still only in research papers. – Lutz Lehmann Jun 20 '21 at 05:33

0 Answers0