I'm trying to implement some of the AD algorithms myself but I don't quite get the edge pushing algorithm by Gower and Mello for computing sparse Hessian.
Does a new computational graph of the "original gradient" need to be generated (for example should the graph (2*x) be generated when calculating (x^2) in order to find the second derivative (2)) since the paper states that the dotted arcs represent "non-linear interactions" and how exactly are the adjoints accumulated to form the second derivative?
Also, if a new graph is needed, how does that differ from symbolic differentiation? Thanks!