UPDATE: the (sparse) three-dimensional matrix v
in my question below is symmetric: v(i1,i2,i3) = v(j1,j2,j3)
where (j1,j2,j3)
is any of the 6 permutations of (i1,i2,i3)
, i.e.
v(i1,i2,i3) = v(i1,i3,i2) = v(i2,i3,i1) = v(i2,i1,i3) = v(i3,i1,i2) = v(i3,i2,i1)
.
Moreover, v(i1,i2,i3) != 0
only when i1 != i2 && i1 != i3 && i2 != i3
.
E.g. v(i,i,j) = 0
, v(i, k, k) = 0
, v(k, j, k) = 0
, etc...
I thought that with this additional information, I could already get a significant speed-up by doing the following:
- Remark:
v
contains duplicate values (a triplet(i,j,k)
has 6 permutations, and the values ofv
for these 6 are the same). - So I defined a more compact matrix
u
that contains only non-duplicates ofv
. The indices ofu
are(i1,i2,i3)
wherei1 < i2 < i3
. The length ofu
is equal to the length ofv
divided by 6. - I computed the sum by iterating over the new value vector and the new index vectors.
With this, I only got a little speed-up. I realized that instead of iterating N
times doing a multiplication each time, I iterated N/6
times doing 6 multiplications each time, and that's pretty much the same as before :(
Hope somebody could come up with a better solution.
--- (Original question) ---
In my program I have an expensive operation that is repeated every iteration.
I have three n
-dimensional vectors x1
, x2
and x3
that are supposed to change every iteration.
I have four N
-dimensional vectors I1
, I2
, I3
and v
that are pre-defined and will not change, where:
I1
,I2
andI3
contain the indices of respectivelyx1
,x2
andx3
(the elements inI_i
are between0
andn-1
)v
is a vector of values.
For example:
We can see v
as a (reshaped) sparse three-dimensional matrix, each index k
of v
corresponds to a triplet (i1,i2,i3)
of indices of x1
, x2
, x3
.
I want to compute at each iteration three n
-dimensional vectors y1
, y2
and y3
defined by:
y1[i1] = sum_{i2,i3} v(i1,i2,i3)*x2(i2)*x3(i3)
y2[i2] = sum_{i1,i3} v(i1,i2,i3)*x1(i1)*x3(i3)
y3[i3] = sum_{i1,i2} v(i1,i2,i3)*x1(i1)*x2(i2)
More precisely what the program does is:
Repeat:
- Compute
y1
then updatex1 = f(y1)
- Compute
y2
then updatex2 = f(y2)
- Compute
y3
then updatex3 = f(y3)
where f
is some external function.
I would like to know if there is a C++ library that helps me to do so as fast as possible. Using for loops is just too slow.
Thank you very much for your help!
Update: Looks like it's not easy to get a better solution than the straight-forward for loops. If the vector of indices I1
above is ordered in non-decreasing order, can we compute y1
faster?
For example: I1 = [0 0 0 0 1 1 2 2 2 3 3 3 ... n n]
.