Pytorch: Back-propagation of custom many-to-one nonlinearity

Question

I have code for an input-output nonlinear function that takes a list of inputs X and weights W and produces a single nonlinear output. I am interested in using this as my "neuron" and seeing if I can use back-propagation to train this. (Ideally I have a number of these neurons chained together, but a single one is fine for now.)

I've asked before if it is possible to do training on a many-to-one nonlinear function, and the answers seem to suggest that it is straightforward to use autograd to do the backpropagation.

import numpy as np
from scipy import integrate, special
from scipy.constants import epsilon_0

z_values =  np.linspace(1e-10, 1-1e-10, 100)

def readinKernel(wdummy, z, Ec, Ep, kval=1):
    return (Ec * kval * special.jv(0, Ec * kval * np.sqrt(np.outer(z, (1 - wdummy))))* Ep / np.cosh(np.arctanh(wdummy)))

def steep_sigmoid(x, k=50):    
    return 1.0 / (1.0 + np.exp(-k*x))

def readoutKernel(zdummy, z, B_in, Ec, kval=1):
    return (1 / np.sqrt(np.maximum(1e-10, np.subtract.outer(z, zdummy))) * 
            special.jv(1, 2 * Ec * kval * 
                       np.sqrt(np.maximum(1e-10, np.subtract.outer(z, zdummy)))) *
            Ec * kval *
            steep_sigmoid(np.subtract.outer(z, zdummy), 50) * 
            np.repeat(B_in, len(zdummy)).reshape(len(B_in), len(zdummy)))

def spinwave_recursive_calculation(B_in, z_values, Ec, Ep):

    wdummy_values = np.linspace(1e-10, 1-1e-10, 100)
    zdummy_values = np.linspace(1e-10, 1-1e-10, 100)

    readin_values = readinKernel(wdummy_values, z_values, Ec, Ep)
    readout_values = readoutKernel(zdummy_values, z_values, B_in, Ec)

    readin_integrals = np.trapz(readin_values, wdummy_values, axis=1)
    readout_integrals = np.trapz(readout_values, zdummy_values, axis=1)

    spinwave = readin_integrals - readout_integrals + B_in
    return spinwave

def input_output_nonlinearity(x, w):

    Bin = np.zeros(len(z_values))
    BoutMatrix = np.tile(Bin, [len(w),1])

    for i in range(len(w)):
        E_c_val = w[i]
        E_p_val = x[i]
        # Origionally this was a recursive funciton that returned a single output array, but now I have implemented it as a for loop that returns a matrix of outputs
        # This was to try to avoid a potential bug in pytorch which was recommended by a user DerekG. (I'm not sure my implementation fixes this issue though)
        # Bout = spinwave_recursive_calculation(Bin, z_values, E_c_val, E_p_val)
        BoutMatrix[i, :] = spinwave_recursive_calculation(Bin, z_values, E_c_val, E_p_val)
        Bin = BoutMatrix[i, :]
    output = print(np.sum(np.abs(Bout)))
    return output

x = np.array([1, 0, 0])
w = np.array([1, .5, 1])

input_output_nonlinearity(x, w)

Is it straightforward to do back-propagation with this code? Any ideas how to proceed with training?

I'd start converting ops to pytorch equivalents to see where there is or isn't an analog. I don't know enough about the Bessel function to know whether it is differentiable and of course ops like `abs` are locally non-differentiable. The heaviside step function is non-differentiable so you'd likely need to replace that or implement our own `backward` function. Other than that most of the operations you use should have workable backwards calls implemented out of box — DerekG, Aug 10 '23 at 12:59
@DerekG, Thanks, I can get rid of the abs() without any issues, and I'll replace the heaviside function. So I'll make a version of this with the torch functions in a new edit soon. — Steven Sagona, Aug 10 '23 at 13:06
One other potential issue is do to looping. If you store the results of the recursive function calls in the same variable name each time, the result will overwrite the previous and the computation graph will be fragmented. You will need to create an array of outputs indexed as`Bout[i] = spinwave...` I suspect — DerekG, Aug 10 '23 at 13:09
@DerekG, thanks again for the tip. To be clear, you think that I need to hand-write the loop as individual lines of code because there's concern that the pytorch functions might do some buggy rewriting if I leave it as a loop? In the case where the loop has 100 elements, then I'd be 100 lines of code. — Steven Sagona, Aug 10 '23 at 13:11
No no, I am saying you need to store each output of each iteration of the loop as a separately named variable (or more elegantly as elements in a larger array) until you backpropogate. Pytorch's computation graph keeps track of, say `A` is the result of mutliplying `B` and `C` together so the gradient of `A` with respect to `B` is `C`. It does not keep track of the fact that `B` had a different value at the time of that multiplication and was subsequently reassigned a new value before backpropagation — DerekG, Aug 10 '23 at 13:22
But also, 100 iterations and you'll definitely have a vanishing or exploding gradient, so this may not produce reliable results at that level of iteration. If you can find a way to re-define this function non-recursively that may yield better results — DerekG, Aug 10 '23 at 13:25
@DerekG, not sure if it can be expressed as anything but a recursive function...Do you have a way of explaining why this would end up causing gradient problems? Also, thanks to your help, I've edited by code in the question to include your suggestions (if you have a chance maybe check my for-loop and see if I implemented your suggestion corectly?), and will soon add a version with the pytorch functions. — Steven Sagona, Aug 10 '23 at 13:56
Essentially, if you have a term x s.t. partial of x_n wrt x_(n-1), and the gradient is even slighly above or below unity (e.g. 0.1 or 2), then partial of x_100 w.r.t x_0 is 0.1^100 or 2^100 which are extremely large are small, often outside of the faithful resolution of floating point computations. You can read more about the vanishing gradient problem, it was the limiting issue with recurrent neural networks 8 or so years ago — DerekG, Aug 10 '23 at 17:04

Pytorch: Back-propagation of custom many-to-one nonlinearity

0 Answers0

Linked