I'm trying to improve the performance of my code. The code basicly calculates the value of L_total
(1x2640) and it does it by fetching the data from another variable called L_CN
(1320x6). I also have colindexes
matrix (2640x3) which stores the values of rows to look at in L_CN
.
So, how this goes is, the code looks at colindexes
to get the row data. Say colindexes
is of the following form:
55 65 75
68 75 85
...
The program will calculate L_total(1)
using L_CN(55,1) + L_CN(65,1) + L_CN(75,1)
. Here the first indices refer to the row numbers obtained from the colindexes
matrix. The second indices represent the number of occurrences of these row numbers so far. Therefore, when we calculate L_total(2)
, it will be L_CN(68,1) + L_CN(75,2) + L_CN(85,1)
. Here L_CN(75,2)
happened because L_CN(75,1)
was used before.
To calculate the entire L_total
matrix, the following code works well. It stores the number of occurrences for each index by incrementing the corresponding index in a variable called list
(2640x1) and thus calculates L_total
. It does so in 0.023715 seconds. (Note that n
is 2640 below)
for i=1:n
list(colindexes(i,:)) = list(colindexes(i,:)) + 1;
L_total(i) = sum(diag(L_CN(colindexes(i,:),list(colindexes(i,:)))));
end
The problem is, I will be running this portion of the code over and over, maybe like a million times. It's a part of a big simulation. Hence, even a smallest portion of increase in the performance is what I am after for. First, I thought getting rid of the for loop would serve for this purpose and switched the code into the following - getting a little help from this topic: Vector of the occurence number:
list_col = reshape(colindexes',1,[]);
occurrence = sum(triu(bsxfun(@eq,list_col,list_col.')));
list = reshape(occurrence,3,[])';
straight_index = colindexes + (list - 1)*k;
L_total = sum(L_CN(straight_index),2)';
This code also does the job for list_col
(1x7920), occurrence
(1x7920), list
(2640x3) and straight_index
(2640x3). However, contrary to my expectations, it takes 0.062168 seconds, about three times worse than the for loop implementation. 0.05217 seconds of this operation is due the second line, where the occurrence matrix is formed. With array sizes like mine, it is truly inefficient to find the occurrences like this way.
The question is, with or without the for loop, how can I increase the performance of this code? The vectorization method seems nice, if only I can figure out a way to calculate the occurrence matrix faster. As I said, this portion of code will be run a lot of times, and thus any percent of increase in performance is welcomed.
Thank you!
Further info:
colindexes
represent a big matrix of size 1320x2640. Instead of storing this entire matrix, I only store the row locations of '1's in this matrix in colindexes
. The rest is zero. So the colindexes
I specified in the question means, there is a '1' in 1st col 55th row and 2nd col 85th row... So the min,max range is 1,1320. There are only 3 '1's in each column, so that's why its size is 2640x3. This is, of course, background information, for how it is formed. If that helps, the number of occurences for each value in colindexes
is also the same, which is 6.
So, for a matrix A = [1 0 0 1; 0 1 1 0]
, the colindexes
is [1; 2; 2; 1]
.