-1

I want to be able to vectorize the for-loops of this function to then be able to parallelize it in octave. Can these for-loops be vectorized? Thank you very much in advance!

I attach the code of the function commenting on the start and end of each for-loop and if-else.

function [par]=pem_v(tsm,pr)

% tsm and pr are arrays of N by n. % par is an array of N by 8

tss=[27:0.5:32];
tc=[20:0.01:29];
N=size(tsm,1);
% main-loop
for ii=1:N
    % I extract the rows in each loop because each one represents a sample
    sst=tsm(ii,:); sst=sst'; %then I convert each sample to column vectors
    pre=pr(ii,:); pre=pre';
    % main-condition
    if isnan(nanmean(sst))==1;
        par(ii,1:8)=NaN;
    else
        % first sub-loop
        for k=1:length(tss);
            idxx=find(sst>=tss(k)-0.25 & sst<=tss(k)+0.25);
            out(k)=prctile(pre(idxx),90);
        end
        % end first sub-loop
        tp90=tss(find(max(out)==out));
        % second sub-loop
        for j=1:length(tc)
            cond1=find(sst>=tc(j) & sst<=tp90);
            cond2=find(sst>=tp90);
            pem=zeros(length(sst),1);
            A=[sst(cond1),ones(length(cond1),1)];
            B=regress(pre(cond1),A);
            pt90=B(1)*(tp90-tc(j));
            AA=[(sst(cond2)-tp90)];
            BB=regress(pre(cond2)-pt90,AA);
            pem(cond1)=max(0,B(1)*(sst(cond1)-tc(j))); 
            pem(cond2)=max(0,(BB(1)*(sst(cond2)-tp90))+pt90); 
            clear A B AA BB;
            E(j)=sqrt(nansum((pem-pre).^2)/length(pre));
            clear pem;
        end
        % end second sub-loop
        tcc=tc(find(E==min(E)));
        % sub-condition
        if(isempty(tcc)==1);
            par(ii,1:9)=NaN;
        else
            cond1=find(sst>=tcc & sst<=tp90);
            cond2=find(sst>=tp90);
            pem=zeros(length(sst),1);
            A=[sst(cond1),ones(length(cond1),1)];
            B=regress(pre(cond1),A);
            pt90=B(1)*(tp90-tcc);
            AA=[sst(cond2)-tp90];
            BB=regress(pre(cond2)-pt90,AA);
            pem(cond1)=max(0,B(1)*(sst(cond1)-tcc)); 
            pem(cond2)=max(0,(BB(1)*(sst(cond2)-tp90))+pt90);    
            RMSE=sqrt(nansum((pem-pre).^2)/length(pre));
            % outputs
            par(ii,1)=tcc;
            par(ii,2)=tp90;
            par(ii,3)=B(1);
            par(ii,4)=BB(1);
            par(ii,5)=RMSE;
            par(ii,6)=nanmean(sst);
            par(ii,7)=nanmean(pre);
            par(ii,8)=nanmean(pem);
        end
        % end sub-condition
        clear pem pre sst RMSE BB B tp90 tcc
    end
    % end main-condition
end
% end main-loop
Wolfie
  • 27,562
  • 7
  • 28
  • 55
JOSE
  • 13
  • 3
  • 3
    It would help to know what this code *does* - example inputs and expected outputs are necessary for us to verify any suggestions. Also, there are things you could do to speed up the code without even trying to vectorise, for instance removing the `clear` statements (and writing code so that they aren't needed) since `clear` is slow. – Wolfie Jan 20 '18 at 21:10
  • 2
    This is a lot of code to expect people to read through without any explanation or any attempt at vectorising it yourself. Please add an explanation of what your code does. – Dan Jan 21 '18 at 02:19
  • @Wolfie JOSE apparently likes dropping some code (for example https://stackoverflow.com/questions/48273556/octave-how-can-i-vectorize-this-function) and asking "how can I vectorize and parallelize it". Obviously what he really wants (but isn't asking for) is to reduze the execution time. There were some other related question but JOSE deletes them as soon as there are negative comments – Andy Jan 21 '18 at 11:27
  • @Andy Thanks for the information. As it happens, JOSE's code [could be sped up](https://stackoverflow.com/a/48361500/3978545) so I gave some tips, along with the revised code, which they will hopefully take on board in their future coding. It's not a great pattern though, I'm sure less people will be as inclined to help as time goes on. – Wolfie Jan 21 '18 at 11:59

1 Answers1

3

You haven't given any example inputs, so I've created some like so:

N = 5; n = 800; 
tsm = rand(N,n)*5+27; pr = rand(N,n);

Then, before you even consider vectorising your code, you should keep 4 things in mind...

  1. Avoid calulating the same thing (like the size of a vector) every loop, instead do it before looping
  2. Pre-allocate arrays where possible (declare them as zeros/NaNs etc)
  3. Don't use find to convert logical indices into linear indices, there is no need and it will slow down your code
  4. Don't repeatedly use clear, especially many times within loops. It is slow! Instead, use pre-allocation to ensure the variables are as you expect each loop.

Using the above random inputs, and taking account of these 4 things, the below code is ~65% quicker than your code. Note: this is without even doing any vectorising!

function [par]=pem_v(tsm,pr)
% tsm and pr are arrays of N by n. 
% par is an array of N by 8    
tss=[27:0.5:32];
tc=[20:0.01:29];
N=size(tsm,1);    
% Transpose once here instead of every loop
tsm = tsm';
pr = pr';

% Pre-allocate memory for output 'par'
par = NaN(N, 8);
% Don't compute these every loop, do it before the loop. 
% numel simpler than length for vectors, and size is clearer still
ntss = numel(tss);
nsst = size(tsm,1);
ntc = numel(tc);
npr = size(pr, 1);

for ii=1:N
    % Extract the columns in each loop because each one represents a sample
    sst=tsm(:,ii); 
    pre=pr(:,ii); 
    % main-condition. Previously isnan(nanmean(sst))==1, but that's only true if all(isnan(sst))
    % We don't need to assign par(ii,1:8)=NaN since we initialised par to a matrix of NaNs
    if ~all(isnan(sst));
        % first sub-loop, initialise 'out' first
        out = zeros(1, ntss);
        for k=1:ntss;
            % Don't use FIND on an indexing vector. Use the logical index raw, it's quicker
            idxx = (sst>=tss(k)-0.25 & sst<=tss(k)+0.25); 
            % We need a check that some values of idxx are true, otherwise prctile will error. 
            if nnz(idxx) > 0          
                out(k) = prctile(pre(idxx), 90);
            end
        end
        % Again, no need for FIND, just reduces speed. This is a theme...
        tp90=tss(max(out)==out);

        for jj=1:ntc
            cond1 = (sst>=tc(jj) & sst<=tp90);
            cond2 = (sst>=tp90);
            % Use nnz (numer of non-zero) instead of length, since cond1 is now a logical vector of all elements
            A = [sst(cond1),ones(nnz(cond1),1)];
            B = regress(pre(cond1), A);
            pt90 = B(1)*(tp90-tc(jj));
            AA = [(sst(cond2)-tp90)];
            BB = regress(pre(cond2)-pt90,AA);

            pem=zeros(nsst,1);
            pem(cond1) = max(0, B(1)*(sst(cond1)-tc(jj))); 
            pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90); 

            E(jj) = sqrt(nansum((pem-pre).^2)/npr);
        end

        tcc = tc(E==min(E));

        if ~isempty(tcc);            
            cond1 = (sst>=tcc & sst<=tp90);
            cond2 = (sst>=tp90);

            A = [sst(cond1),ones(nnz(cond1),1)];
            B = regress(pre(cond1),A);
            pt90 = B(1)*(tp90-tcc);
            AA = [sst(cond2)-tp90];
            BB = regress(pre(cond2)-pt90,AA);

            pem = zeros(length(sst),1);
            pem(cond1) = max(0, B(1)*(sst(cond1)-tcc)); 
            pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90);   

            RMSE = sqrt(nansum((pem-pre).^2)/npr);
            % Outputs, which we might as well assign all at once!
            par(ii,:)=[tcc, tp90, B(1), BB(1), RMSE, ...
                       nanmean(sst), nanmean(pre), nanmean(pem)];
        end
    end
end
Wolfie
  • 27,562
  • 7
  • 28
  • 55