I need help with an optimization and performance problem related to the calculation of the Weighted Geometric Mean of some data.
I introduce the problem with a little sample. I wrote the code to calculate the WGM for the simple example below.
% A matrix Example 3x3 matrix
% w column vector 3x1
% wgm row vector 1x3
A = rand(3);
w = [1,2,6]';
wgm = (prod(A.^w)).^(1/sum(w));
Now for the general problem:
Suppose I have a new A matrix sized nxm and a W matrix composed of weight columns where weights values can go from 0 to k and I need all the columns permutations.
That is the W matrix is sized as n x k^n, since the nature of the weights and the weighted geometric computation this final matrix should be reduced excluding columns that represent a multiplication by a scalar value going from 0 to k of a permutation.
So if I have a column like [1, 1, 0] already that should exclude all t*[1,1,0] with t going from 0 to k. Another example: [1 2 3] should exclude [2 4 6] or [3 6 9] and so on.
Basic idea: each generated column for the W matrix could be normalized dividing each weight by k, so if the new normalized column is redundant should not be added then converted back to an uint8 column to reduce memory consumption to 12.5%.
So considering a real data example suppose:
- I have a static A matrix 32x30.
- Weights values that go from 0 to 99.
- I need a way to create the W Matrix sized 32x100^32 and to optimize it.
- To calculate an optimized originally WGM 100^32x30 matrix where each row is the computation result from A matrix and corresponding W column.
So the problems to solve are:
- Creation of the optimized matrix of Weights both in size and performance.
- Calculation of the WGM Matrix.
- A way to allocate and partition those matrixes to avoid memory problems.
- Converting the Matlab code to GPU code for computation on a Cuda Device (1080 GTX with 8 GB video memory).
- Storing the final matrices in an efficient way.
Added information:
The Weighted Geometric Means matrix data will be validated through a set of stricter rules, and so not compliant rows will be discarded, same for the final W matrix where the elimination will occur for corresponding columns.
This could be evaluated earlier while creating the 2 matrices to find a solution that optimizes memory consumption while being maybe less efficient performance wise.