Automatic differentiation for incrementally updated states

Question

Can automatic differentiation (e.g. tensorflow/pytorch) intelligently backpropagate through a [L2] neuron (of size batchSize) that was incrementally updated (addition of existing activated batch elements and newly activated batch elements) over time (w words)? I.e. can it update the incoming weights/subnetwork based on the specific batch subset of activations of the previous layer [L1] neurons that were used (at time w) to incrementally update the [L2] neuron (batch subset)?

Pseudo code:

#numberOfWords = 2
#batchSize = 5
#numberInputFeatures = 2
#x.shape = numberOfWords*batchSize*numberInputFeatures
#L1numberOfNeurons = 2  #optional: or L1numberOfNeurons = numberInputFeatures
#L2numberOfNeurons = 1
#L1neuronsState.shape = batchSize*L1numberOfNeurons
#L2neuronState.shape = batchSize
#W1.shape = numberInputFeatures*L1numberOfNeurons   #optional
#W2.shape = L1numberOfNeurons*L2numberOfNeurons
def forwardPropagateIncrementallyUpdateState(x, targetState):
    L2neuronState = zeros(size=batchSize)
    for w in range(numberOfWords):
        inputFeatures = x[w]
        L1neuronsState = activationFuntion(inputFeatures*W1)    #optional: or L1neuronsState = inputFeatures
        L2neuronUpdate = activationFuntion(L1neuronsState*W2)
        L2neuronState = L2neuronState + L2neuronUpdate
    loss = L2neuronState - targetState
    return loss

user2585501 · Answer 1 · 2022-05-06T07:56:54.450

The function scatter can be used to insert new values into a tensor at specific indices. This function is available in tensorflow as tensor_scatter_nd_update (https://www.tensorflow.org/api_docs/python/tf/tensor_scatter_nd_update) or in pytorch as scatter (https://pytorch.org/docs/stable/generated/torch.scatter.html).

Likewise, the function dynamic stitch can be used to combine 2+ tensors at specific indices. This function is available in tensorflow as dynamic_stitch (https://www.tensorflow.org/api_docs/python/tf/dynamic_stitch), and can be replicated in pytorch (e.g. https://discuss.pytorch.org/t/equivalent-of-tf-dynamic-partition/53735/2).

Pseudo code:

#L2neuronState = tensorApplyNonZeroUpdates(L2neuronState, L2neuronUpdate)
#L2neuronState = combineTensorsWithNonZeroUpdates(L2neuronState, L2neuronUpdate)

def tensorApplyNonZeroUpdates(neuronStateExisting, neuronStateNonZeroUpdates):

    indices2 = where(neuronStateNonZeroUpdates) #non-zero indices
    data2 = gather(neuronStateNonZeroUpdates, indices2)     #non-zero data
    
    neuronStateNew = tensor_scatter_nd_update(neuronStateExisting, indices2, data2)
    
    return neuronStateNew
    
def combineTensorsWithNonZeroUpdates(neuronStateExisting, neuronStateNonZeroUpdates):

    neuronStateNoUpdates = equal(neuronStateNonZeroUpdates, 0)  #all zeros as true, else false
    indices1 = where(neuronStateNoUpdates)  #zero update indices
    data1 = gather(neuronStateExisting, indices1)   #existing data at zero update indices
    
    indices2 = where(neuronStateNonZeroUpdates) #non-zero update indices
    data2 = gather(neuronStateNonZeroUpdates, indices2)     #non-zero update data
    
    indices = [indices1, indices2]
    data = [data1, data2]
    neuronStateNew = dynamic_stitch(indices, data)
        
    return neuronStateNew

Automatic differentiation for incrementally updated states

1 Answers1