4

It will be a long question, please, take a deep breath before reading.

I want to understand what would be the fastest algorithm to convert index of one dimensional array to a vector index of a multidimensional array.

Let's proceed with an example to understand why do I need it:

I have a 2 dimensional array: Array[i1][i2]

i1 runs from i1_b=0 to i1_e=2

i2 runs from i2_b=0 to i2_e=1

So this array is outputted into the file line by line:

Array[0][0]

Array[0][1]

Array[0][2]

Array[1][0]

Array[1][1]

Array[1][2]

Now I read the file line by line and index k is the number of the line being read last.

I read the first line which is Array[0][0] and k=0

I read the second line which is Array[0][1] and k=1

...

One can notice that k will run from k_b=0 to k_e=5 and

k=0 will correspond to i1=0, i2=0

k=1 will correspond to i1=0, i2=1

...

Problem: So my problem is how to convert k into i1 and i2 the fastest way possible? (I don't need it while reading the file, but later in my program)

In this example, one of the solutions would be

i1=k/(i1_e - i1_b + 1);

i2=k%(i1_e - i1_b + 1);

Question 1: Is it the fastest possible solution in term of cycles and computer time?

OK. Question 2: How can we generalize this algorithm to multidimensional arrays?

Array[i1][i2][i3][i4]

i1=k/(i1_e - i1_b + 1);

i2=k%(i1_e - i1_b + 1);

i3=i2/(i1_e - i1_b + 1);

i4=i2%(i1_e - i1_b + 1);

Question 3: Is it the fastest way to do it?

Question 4: related question would be what is the latency for modular division, integer division, adding integers and multiplying integers? If these numbers depend on the architecture, please, also let me know.

Thanks in advance!

P.S. It may be easier for someone to think about this problem as the fastest algorithm to convert seconds into days-hours-minutes-seconds.

Community
  • 1
  • 1
user1541776
  • 497
  • 4
  • 14
  • Though the __real__ question is, will it matter; have you profiled? – orlp Jul 20 '12 at 21:04
  • @nightcracker: Sometimes your requirement is *as fast as possible*. Not all code is simple UI stuff where small delays are meaningless. I work on some image processing code from time to time. Right now it is fast enough as to meet our requirements. However, if/when I have an idea on how to make it faster, I will, because making these algorithms faster makes our software noticeably faster, which in turn makes it better. I'm not trying to say you're wrong, of course he should profile. However, it sounds to me as if the OP is asking for the fastest solution possible, which is different. – Ed S. Jul 20 '12 at 23:17
  • @nightcracker In my particular case time is not an issue, but EdS got my point - I just want to know whether there exists the fastest solution to use in all my future codes – user1541776 Jul 22 '12 at 16:53

2 Answers2

7

Question 2: How can we generalize this algorithm to multidimensional arrays?

If you have an array arr[dim_1][dim_2]...[dim_n], you have the equation

k = i_1*(dim_2*...*dim_n) + i_2*(dim_3*...*dim_n) + ... + i_{n-1}*dim_n + i_n
  = i_1*(dim_2*...*dim_n) + r_2

so i_1 = k / (dim_2*..*dim_n) and r_2 = k % (dim_2*...*dim_n), then

i_2 = r_2 / (dim_3*...*dim_n) and r_3 = r_2 % (dim_3*...*dim_n)

etc,

i_j = r_j / (dim_{j+1}*...*dim_n) and r_{j+1} = r_j % (dim_{j+1}*...*dim_n)

until i_n = r_n is found.

Question 3: Is it the fastest way to do it?

If the dimensions are known at compile time, the divisions can be replaced by multiplications, shifts and additions/subtractions. On many architectures, that is faster than a division instruction. On others, not.

But it's only worthwhile thinking about if you're doing a lot of indexing in that array and not much else.

Question 4: related question would be what is the latency for modular division, integer division, adding integers and multiplying integers? If these numbers depend on the architecture, please, also let me know.

These numbers depend on the architecture and processor.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • thank you very much for you answer! I am left only with one question: if the algorithm you provided answering Question 2 is the fastest one? I am curious, because I guess all programmers need to "convert indexes", so there should be "the best known solution" or at least an investigation whether this "best solution" exists and what does it depend on. – user1541776 Jul 22 '12 at 17:02
  • Normally, the conversion is left to the compiler, and goes the other way, an access `arr[i1][i2][i3]` is converted to `base_address + (i1*(d2*d3) + i2*d3 + i3)*sizeof(element_type)`. If the dimensions are known at compile time, the compiler should find the fastest way for the conversion. Whether that means using this form and inserting the constants `d2*d3` etc as compile-time computed literals, or evaluating it a la Horner's rule may depend, but there's no fundamentally faster way. Going from the offset to the index tuple is much rarer, and I don't know of a faster way than division... – Daniel Fischer Jul 22 '12 at 17:53
  • ... but on the hardware I have experience with, transforming the divisions to shifts, multiplications and additions, if the dimensions are statically known, is significantly faster. Of course, if all dimensions are powers of 2, that's the absolute best case, then you need only shifts and masks. – Daniel Fischer Jul 22 '12 at 17:56
  • thank you very much for your help, all your exhaustive and lucid answers! – user1541776 Jul 23 '12 at 00:22
0

please find below how I would implement this in C++1x, I hope this can be useful. Cheers

#include <iostream>
#include <array>
#include <algorithm>


/* stream arrays element by element to ostream */
template<size_t N, typename T>
std::ostream& operator<<(std::ostream& os, std::array<T, N> const&  obj)
{
  os << "{ ";
  for(auto a:obj)std::cout << a << " ";
  std::cout << "}";

  return os;
}

//end of recursion
template<size_t DIM, size_t I>
constexpr typename std::enable_if< (I==DIM), void  >::type
get_indexes(unsigned int index, std::array<unsigned int, DIM> const& depths, std::array<unsigned int,DIM>& indexes)
{}

//begin of the recursion
template<size_t DIM, size_t I=0>
constexpr typename std::enable_if< (I<DIM), void  >::type
get_indexes(unsigned int index, std::array<unsigned int, DIM> const& depths, std::array<unsigned int,DIM>& indexes)
{
    unsigned int  factor    =  1;
    for(unsigned int i=I+1; i<DIM; i++) factor *=depths[i];
    indexes[I]  =  index/factor;
    unsigned int next_index =  index%factor;
    get_indexes<DIM, I+1>(next_index, depths, indexes );
}

//some testing with 3 dimensions 
int main()
{
  constexpr unsigned ndimensions=3;
  std::array<unsigned int, ndimensions> depths{2, 3, 4};
  unsigned int nboxes = 1;

  for(unsigned int i =0; i< ndimensions; i++)nboxes *=depths[i];

  std::array<unsigned int, ndimensions> indexes;

  for(size_t i=0; i<nboxes; i++)
  {

    get_indexes<ndimensions>(i,depths ,  indexes);
    std::cout << i << " -> " <<indexes<< std::endl; 
  }


return 0;
}