Can automatic differentiation (e.g. tensorflow/pytorch) intelligently backpropagate through a [L2] neuron (of size batchSize) that was incrementally updated (addition of existing activated batch elements and newly activated batch elements) over time (w words)? I.e. can it update the incoming weights/subnetwork based on the specific batch subset of activations of the previous layer [L1] neurons that were used (at time w) to incrementally update the [L2] neuron (batch subset)?
Pseudo code:
#numberOfWords = 2
#batchSize = 5
#numberInputFeatures = 2
#x.shape = numberOfWords*batchSize*numberInputFeatures
#L1numberOfNeurons = 2 #optional: or L1numberOfNeurons = numberInputFeatures
#L2numberOfNeurons = 1
#L1neuronsState.shape = batchSize*L1numberOfNeurons
#L2neuronState.shape = batchSize
#W1.shape = numberInputFeatures*L1numberOfNeurons #optional
#W2.shape = L1numberOfNeurons*L2numberOfNeurons
def forwardPropagateIncrementallyUpdateState(x, targetState):
L2neuronState = zeros(size=batchSize)
for w in range(numberOfWords):
inputFeatures = x[w]
L1neuronsState = activationFuntion(inputFeatures*W1) #optional: or L1neuronsState = inputFeatures
L2neuronUpdate = activationFuntion(L1neuronsState*W2)
L2neuronState = L2neuronState + L2neuronUpdate
loss = L2neuronState - targetState
return loss