1

I have very big train set so that Matlab. And I need to do large scale training.

Is it possible to split the training set into parts and iteratively train the network and on each iteration update the "net" instead of over-writing to it?

The code below shows the idea and it won't work. In each iteration it updates the net depending on the only the trained data set.

TF1 = 'tansig';TF2 = 'tansig'; TF3 = 'tansig';% layers of the transfer function , TF3 transfer function for the output layers

net = newff(trainSamples.P,trainSamples.T,[NodeNum1,NodeNum2,NodeOutput],{TF1 TF2 TF3},'traingdx');% Network created

net.trainfcn = 'traingdm' ; %'traingdm';
net.trainParam.epochs   = 1000;
net.trainParam.min_grad = 0;
net.trainParam.max_fail = 2000; %large value for infinity

while(1) // iteratively takes 10 data point at a time.
 p %=> get updated with following 10 new data points
 t %=> get updated with following 10 new data points

 [net,tr]             = train(net, p, t,[], []);

end
lennon310
  • 12,503
  • 11
  • 43
  • 61
alper
  • 2,919
  • 9
  • 53
  • 102
  • How do you know it is overwriting instead of updating? Can you show some data example since I don't quite understand what do you mean by "In each iteration it updates the net depending on the only the trained data set."? Thanks – lennon310 Jan 18 '14 at 02:34
  • I am using neural network in time series to predict multiple points away(90). If I use train(), in each iteration it overwrites into do already trained portion. I observe it through comparing my predicted values and actual values. By time there was no improvement and my predictions were based on only the trained small portion of the data. Instead of training 1,000,000 data points it is more efficient to train it iteratively I think. Currently I am trying adapt() function but I couldn't see any improvement in my predictions. That was my question//is it possible with adapt(). Thanks. – alper Jan 18 '14 at 02:42
  • The predictions are only based on the trained data. In each my iteratively training, I won't able to see previously trained data's pattern or behaviour. – alper Jan 18 '14 at 02:46

2 Answers2

2

I haven't get a chance to take a look at adapt function yet, but I suspect it is updating instead of overwriting. To verify this statement, you may need to select a subset of your first data chunk as the second chunk in training. If it is overwriting, when you use the trained net with the subset to test your first data chunk, it is supposed to poorly predict those data that do not belong to the subset.

I tested it with a very simple program: train the curve y=x^2. During first training process, I learned the data set [1,3,5,7,9]:

   m=6;
   P=[1 3 5 7 9];
   T=P.^2;
   [Pn,minP,maxP,Tn,minT,maxT] = premnmx(P,T);
   clear net
   net.IW{1,1}=zeros(m,1);
   net.LW{2,1}=zeros(1,m);
   net.b{1,1}=zeros(m,1);
   net.b{2,1}=zeros(1,1);
   net=newff(minmax(Pn),[m,1],{'logsig','purelin'},'trainlm');
   net.trainParam.show =100;
   net.trainParam.lr = 0.09;
   net.trainParam.epochs =1000;
   net.trainParam.goal = 1e-3; 
   [net,tr]=train(net,Pn,Tn);
   Tn_predicted= sim(net,Pn)
   Tn

The result (note that the output are scaled with the same reference. If you are doing the standard normalization, make sure you always apply the mean and std value from the 1st training set to all the rest):

Tn_predicted =

   -1.0000   -0.8000   -0.4000    0.1995    1.0000


Tn =

   -1.0000   -0.8000   -0.4000    0.2000    1.0000

Now we are implementing the second training process, with the training data [1,9]:

   Pt=[1 9];
   Tt=Pt.^2;
   n=length(Pt);
   Ptn = tramnmx(Pt,minP,maxP);
   Ttn = tramnmx(Tt,minT,maxT);


   [net,tr]=train(net,Ptn,Ttn);
   Tn_predicted= sim(net,Pn)
   Tn

The result:

Tn_predicted =

   -1.0000   -0.8000   -0.4000    0.1995    1.0000


Tn =

   -1.0000   -0.8000   -0.4000    0.2000    1.0000

Note that the data with x=[3,5,7]; are still precisely predicted.

However, if we train only x=[1,9]; from the very beginning:

   clear net
   net.IW{1,1}=zeros(m,1);
   net.LW{2,1}=zeros(1,m);
   net.b{1,1}=zeros(m,1);
   net.b{2,1}=zeros(1,1);
   net=newff(minmax(Ptn),[m,1],{'logsig','purelin'},'trainlm');
   net.trainParam.show =100;
   net.trainParam.lr = 0.09;
   net.trainParam.epochs =1000;
   net.trainParam.goal = 1e-3; 
   [net,tr]=train(net,Ptn,Ttn);
   Tn_predicted= sim(net,Pn)
   Tn

Watch the result:

Tn_predicted =

   -1.0071   -0.6413    0.5281    0.6467    0.9922


Tn =

   -1.0000   -0.8000   -0.4000    0.2000    1.0000

Note the trained net did not perform well on x=[3,5,7];

The test above indicates that the training is based on previous net instead of restarting. The reason why you get worse performance is you only implement once for each data chunk (stochastic gradient descent rather than batch gradient descent), so the total error curve may not converge yet. Suppose you only have two data chunk, you may need to re-training the chunk 1 after done training chunk 2, then re-training chunk 2, then chunk 1, so on and so forth until some conditions are met. If you have much much more chunks, you may not need to worry about the 2nd compared with 1st training effect. Online learning just drops out the previous data set no matter whether the updated weights compromise the performance on them.

lennon310
  • 12,503
  • 11
  • 43
  • 61
  • Thank you for your response. In my case I have a very large data set and my goal is to predict 90 points ahead. In that case I have input values and output point. For example when I train first 1000 data until it gets to minimum error which has >95% accuracy. After it has been trained, when I have done the same for the second 1000 portion of the data;it overwrites the weight and the predictor mainly behave as the latest train portion of the data. I won't able to come up with a solution to this problem. In iterative training should I keep the learning rate small and #ofepochs as small. Thanks. – alper Jan 18 '14 at 11:24
  • For each iteratively training portion of the data I always trying to keep it until training data reaches to >90% accuracy. – alper Jan 18 '14 at 11:28
  • Instead of first training I was doing adapt(). I tried your suggestion but I couldn't see any improvement, unfortunately. As a proof on first train, the train data perfectly fits but after when I train next 1000 portion of the data suddenly previously fit data won't fitanymore The prediction's behaviour still focus on the last trained portion of the data under iteration and it completely ignores the already trained portion of the data. What should be optimal alpha(learning rate) for adapt() function/ Under while for trainSamples.P,trainSamples.T should be it only block of training samples? – alper Jan 18 '14 at 13:55
1

Here an example of how to train a NN iteratively ( mini batch ) in matlab:

just create a toy dataset

[ x,t] = building_dataset;

minibatch size and number

M = 420 
imax = 10;

lets check direct-training vs minibatch training

net = feedforwardnet(70,'trainscg');
dnet = feedforwardnet(70,'trainscg');

standard training here : 1 single call with the whole data

dnet.trainParam.epochs=100;
[ dnet tr y ] = train( dnet, x, t ,'useGPU','only','showResources','no');

a measure of error : MEA , easy to measure MSE or any other you want

dperf = mean(mean(abs(t-dnet(x))))

this is the iterative part: 1 epoch per call

net.trainParam.epochs=1;
e=1;

until we reach the previous method error, for epoch comparison

while perf(end)>dperf

very very important to randomize the data at each epoch !!

    idx = randperm(size(x,2));

train iteratively with all the data chunks

    for i=1:imax
        k = idx(1+M*(i-1) : M*i);
        [ net tr ] = train( net, x( : , k ), t( : , k ) );
    end

compute the performance at each epoch

    perf(e) = mean(mean(abs(t-net(x))))
    e=e+1;
end

check the performance, we want a nice quasi-smooth and exp(-x) like curve

plot(perf)
ciro
  • 26
  • 1
  • Welcome to Stack Overflow, and thank you for your submission! In order to help us provide a good resource for the largest number of programmers, could you please update your answer with some explanation of why this is a good solution to the question? – gariepy Mar 30 '16 at 19:57