I have been hard at work learning how to write custom neural network understanding without a library of any sort
it is a fairly complicated concept but i have read enough and watched enough videos to think i get it
but my network is not reliable
and weirdly; sometimes it goes backwards as in my prediction/confidence goes from high to low
here is what i have understood so far
going forward its
weights * inputs + bias
i wrote a dotproduct that as far as i can tell is working fine
then its activation to see if that particular node is firing or not
in my hidden layers im using ReLU and in my output layer im using Softmax those seem to be working fine also
and then calculate error which im using cross entropy and one hot encoded targets
as far as i can see going forward my code works fine
but going backward is i think where my understanding fails
I spent alot of time trying to understand backward propagation and im almost there
1st off
I understand that it is a measure of how much change is necessary to make this prediction better also known as gradient descent
so because there are so many variables it becomes a chain rule on multiple variables
if you summarize that concept it becomes
'Error/with respect to weight
or
on the output layer because the error is directly connected to the output ('Error[index] / Output [index]) * ('Output[index] / Input[index] ) * (`Input[index] / weight)
on the hidden layer
because the error is a compilation of the total error of the previous layer ('ErrorTotal[index] / Output [index]) * ('Output[index] / Input[index] ) * (`Input[index] / weight)
where ErrorTotal is all of the errors from the previous layer multiplied by the weights that connect to the nodes from the previous layer to the current node at the current index
Pragmatically you would
1 calculate the derived activation with respect to the value coming from the dot product 2 calculate the dot product with respect to the weights - (which is the output of the node that was sent to it originally)
/**
* This function sends the previous layers derived activation value
* to the current layer in back propagation as well as calculate
* the previous layers as well as update the previous layers weights per node
* @param current_layer
* @param previous_layer_in_back_propagation
* @constructor
*/
Pass_Backward(current_layer , previous_layer_in_back_propagation ) {
/*
you have to go backward - the current index increments after a full iteration of the prvious weight index
all the weights have to point to the same current node
so current foreach
nested previous foreach
*/
let inside_last_layer = (previous_layer_in_back_propagation.Get_Meta_Tag() == Enumerations.Meta_Tags.Output_Layer);
let activated_error_derivatives = null;
if(!inside_last_layer) {
activated_error_derivatives = previous_layer_in_back_propagation.Get_Entire_Node_List_Total_Error_Values_As_Array();
}
else {
activated_error_derivatives = previous_layer_in_back_propagation.Get_Entire_Node_List_Derived_Activation_Values_As_Array();
}
current_layer.Get_Entire_Node_List().forEach((current_node, current_node_index) => {
// training the last layer
let dot_product = null;
previous_layer_in_back_propagation.Get_Entire_Node_List().forEach((previous_node, previous_node_index) => {
let weights_derivatives = [];
previous_node.Get_Weights().forEach((previous_nodes_weight, previous_nodes_weight_index) => {
weights_derivatives.push(previous_node.Get_Derived_Activation_Value() * current_node.Get_Value());
});
previous_node.Set_Weights_Derivative_Values(weights_derivatives);
dot_product = Maths.Dot_Product(activated_error_derivatives, weights_derivatives);
});
current_node.Set_Nodes_Total_Error(current_node.Get_Value() * dot_product );
});
this.Update_Weights_In_Backpropagation( previous_layer_in_back_propagation );
}```
but like i said im missing something
my network is not reliable
sometimes it learns - confidence in prediction increases
sometimes its everywhere
sometimes it unlearns - confidence in prediction decreases
please help
i have included my backward propagation which happens after i derive my activation
i have read soooo much and watched sooo many tutorials and written so many variations and havent got it yet
UPDATE: The predictions become unstable when i introduce negative weight initializations into the system