1

When implementing the ReLU function for AutoDiff, one of the methods used is the std::max function; other implementations (conditional statements) work correctly but a try to implement max functions returns only 0 in the whole range.

On input vector: dual in[] = { -3.0, -1.5, -0.1, 0.1, 1.5, 3.0 } the derivative call in the form derivative(ReLU,wrt(y),at(y)) where y = in[i] gives proper results if ReLU is implemented with:

dual ReLU_ol(dual x) {
    return (x > 0) * x;             // ok; autodiff gives x > 0 ? 1 : 0
}
dual ReLU_if(dual x) {
    if (x > 0.0)  {
        return x;
    } else {
        return 0.0;
    }
// ok; autodiff gives x > 0 ? 1 : 0
}

that is (regarding derivative) one if x > 0 and zero elsewhere.

When ReLU is implemented in the form:

dual ReLU_max(dual x) {
    return std::max(0.0,(double)x); // results an erroneous result of derivative 
}

as result, I get zero in the whole range.

I expect that std::max (or std::min) should be prepared correctly for automatic differentiation. Am I doing something wrong or maybe I miss understand something?

The resulting plot is:

where d / dx is calculated with AutoDiff; purple and blue lines are overlapping; results for ReLU_ol are not plotted.

Arek
  • 31
  • 3
  • I would like to see whole definition of the function (mainly the cv-qualifiers of return type) and how you call it. I suspect. that it is either templated or uses automatic type deduction. – Revolver_Ocelot Oct 26 '22 at 10:04
  • I completed/edited the question – Arek Oct 26 '22 at 10:31
  • What is `dual`? In first and second example you are returning it unchanged, in problematic you construct new from a `double`. Are you sure you are not losing information here? – Revolver_Ocelot Oct 26 '22 at 10:56
  • @Revolver_Ocelot, so it is two doubles (according to https://en.wikipedia.org/wiki/Dual_number) whereas the second part (as far as I understand) is to determine the error (imaginary part) of the calculated derivative at a point. In my tests, I didn't notice a loss of accuracy. Nevertheless, some way around the problem is that AutoDiff contains its own definition of max/min functions (I didn't notice earlier) ...so I guess the problem is hard to solve within AutoDiff::derivative(?) – Arek Oct 28 '22 at 10:44

0 Answers0