I made a for loop which strangely increases in duration at each iteration although the amount of variables manipulated remains constant. The code is below with:
- X [N*F]: a numpy array with N samples containing F variables (features);
- parts [N]: a numpy array containing the number of the participant of each sample in X;
- model_filename: the model template file name for each participant (i.e. I have a model per participant)
My goal is to apply the model of participant p to the data of participant p and and to save its output (i.e. N outputs).
outputs = np.full((X.shape[0],), np.nan)
for curr_part in np.unique(parts):
print("processing participant {0}".format(curr_part))
model = load_model(model_filename.format(curr_part)) # I measured the duration of this call (d0)
idx = (parts == curr_part)
outputs[idx] = np.squeeze(model.predict(X[idx,:])); # I measured the duration of this call (d1)
Both d1 and d0 increase at each iteration of the loop (the whole loop take 1.5 seconds at iteration 0 and around 8 seconds at iteration 20). I completely fail to understand why. Also interestingly, if I run the code several times in ipython the duration accumulate as long as I do not restart the kernel (i.e. on the second run iteration 0 takes around 8 seconds). Of course I want to run the code several times so this issue is critical on the long run.
I also tried with the following code which takes approx. the same total duration although I cannot measure the time of each call:
unik_parts = np.unique(parts);
models = [(p, load_model(model_filename.format(p))) for p in unik_parts]
outputs = [np.squeeze(m.predict(X[parts == p,:])) for p,m in models]
Python version 2.7
Models are models from keras