1

Read through a lot of articles and now I am in my braindrain end, and need a fresh perspective on the concept of mini batch. I am new to machine learning and would appreciate any advice on whether my process is correct. Here is my premise:

I have a dataset of 355 features and 8 class output. A total of 12200 data. This is a rough visualization of my neural network: Neural Network Sketch

I decided on 181 neurons for Hidden layer 1, 96 neurons for Hidden Layer 2. I used ReLu activation for the hidden layers and Logistic for the output layer.

To do the mini-batch, I set my batch size to 8. So I have a total of 1525 batch with 8 dataset per batch. Here is my step:

  1. Get 1st Batch of data (8 sets of 355 inputs and 8 outputs).
  2. Forward Propagation of the Batch.
  3. Get the Errors and calculate the sum of squares of error. For the sum of squares, I averaged the errors of the batch first, used the formula SumError = (1/8)*sum(error^2)
  4. Back Propagation of the Batch
  5. Get the average of the weights value after back propagation.
  6. Use the new weights as the weights for the next batch.
  7. Get next Batch of Data (8 sets of 355 inputs and 8 outputs).
  8. Repeat 2-7 using the new set of weights.
  9. When all batch is done, get the average of the SumError to get the Sum of Squares per epoch.
  10. Repeat 1-9 until SumError per epoch is small.
  11. Get the final weights to be used for validation

That is the process of my mini batch. Is this correct? I mean, for the weights, do I use the weights calculated after each batch as the input weights for the next batch, or do I collect all the weights first (the starting weights will be used for all the batch), and then average the weights for all the batches? Then use the average weights as input to the next epoch?

axia_so2
  • 27
  • 6
  • in step 5 are you finding multiple derivatives for the same model parameters? It sounds like you are finding different model param derivative vectors for each data point in the batch, is that what you are doing? – Vass Jan 31 '23 at 00:18

1 Answers1

1

Actually, u have to define your epoch, and each epoch should spread all your input data once at least(not only 2-7times).And after one epoch has one weight updated and repeat the steps until finish all epoch.

Ran
  • 51
  • 2
  • Hi Ran, thanks for the reply. So basically, I will set it to run say Epoch = 20. Each Epoch will run in MiniBatch with Batchsize of 8. Meaning in each epoch, it will run 1525 batches. I will collect all the delta weights first, then average them and update the weights to be fed to the next epoch. Is this correct? – axia_so2 Nov 03 '21 at 07:05
  • Hi Axia, not like collect all and average them and update the weight in one time. U can think like that, in each epoch, every mini-batch data will update the weight every time. The purpose is to quickly find the right direction according to the best gradient. And inside each mini-batch may include the history data in different mini-batch data. Thus, with the times of mini-batch updates, the last time weight must be advanced than previous one. And finally through times of epoch u can see the error will decrease and u will get the best weight. – Ran Nov 04 '21 at 02:04
  • Wow. Thanks so much Ran! It is clearer to me now. Really appreciate your taking time to answer my questions. Cheers! – axia_so2 Nov 04 '21 at 02:11
  • No worries :) if u like my answer u can give me a thumb up haha – Ran Nov 04 '21 at 05:41
  • was actually gonna gonna like and accept the answer and thumb up, but I cant seem to do it if it was made as a comment. sorry, still a noob at stack. I think must be posted as answer for me to accept solution? or am I also missing something to give a thumbs up? – axia_so2 Nov 04 '21 at 12:09
  • okay found it. haha! – axia_so2 Nov 04 '21 at 12:09