1

I have a huge binary matrix, like 100000 x 100000.

Reading this article http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf, I seemed to understand that the best tradeoff to memorize and work with a binary matrix is using boost::dynamic_bitsets.

Since in "Table 2: Relative time performance of the programs that implemented the data structures" : std::vector<bool> is in last position, while boost::dynamic_bitset is in first position.

And in "Table 3: Relative memory usage of the programs that implemented the data structures": std::vector<bool> is in first position, but boost::dynamic_bitset is in second position.

Besides, in the paper, at page 7th, there is the following statement:

"Despite the impressive memory performance of std::vector, its dismal time performance renders it unusable in large-scale applications."

And in the conclusions:

"We have shown that boost::dynamic_bitset is considerably more efficient than most of the other implementations in terms of execution speed, while the implementation using std::vector<char> outperformed the other implementations in terms of memory efficiency."

Now in my case, my target machine is a XEON PHI.
My target application is Game Of Life.
I have rappresented the binary matrix as a binary array of ROWS x COLS cells.

I have tried the code with 3 different configurations, bulding them with -the icpc compiler with -O3 optimization flag:

  1. Array of booleans
  2. Array of booleans + vectorization, i.e. changing the code using the Array Notation as described here
  3. boost::dynamic_bitsets. In this case, I could not change the code using the Array Notation since, when I try, I get the following error:

    error: base of array section must be pointer or array type
    

    same error when using std::vector<bool>.

Looking to the performance of just one iteration of the game main loop for a matrix of 100000 x 100000 size, I have found that: solution 2 works almost six times faster than solution 1, but unexpectedly solution 1 works twice faster than solution 3.

In conclusion, I have the following questions to make:

  1. What is, in general, the most efficient data structure to store/work with an HUGE MATRIX ?
  2. Can I do better than "answer 1" knowing that my target machine is a XEON PHI ?
  3. Is it possible to apply vectorization to vector<bool> or boost::dynamic_bitsets ?

Thanks for the answer about the specific target application: Game Of Life.
But what about working with a huge binary matrix in other context ?

Draxent
  • 500
  • 1
  • 6
  • 14

1 Answers1

1

If you REALLY care about performance in Conway's game of life, you should switch to a purely bit parallel boolean math design. The simple task of counting 8 neighbors is annoyingly hard as a parallel boolean operation, but worth the trouble. The 64-way direct parallelism alone pays back a multiple of the cost of bitwise.

You might have some 128-bit or higher direct parallelism possible on some CPUs with the same basic design.

Once you are using 64-bit or bigger integers instead of bools, all issues of efficiently storing bools become irrelevant.

When I did this in assembler decades ago, I found one important optimization was to share information between successive rows. When doing that, it became easier to look at the total of a block of nine cells rather than eight neighbors. So it may help to realize the rules can be compatibly restated:
When there are 3 in its set of 9, a cell turns on (whether it was on before or not).
When there are 4 in its set of 9, a cell is unchanged.
Otherwise it turns off.

The way info was shared across rows heavily depended on the asm language and register set of that machine decades ago. So you might or might not find looking at the full 9 (instead of 8 neighbors) helps you.

JSF
  • 5,281
  • 1
  • 13
  • 20
  • I think there are vastly faster ways optimize Conway's game of life using memoized/precalculated cell configurations. – sehe Dec 30 '15 at 23:24
  • 2
    If you care about performance of Life you may want to do this. But if you care with all-caps bold "really", you'll probably want to skip on the micro-optimisation and use Gosper's HashLife algorithm which is likely to make this micro-optimization pointless. HashLife is just at an entirely different scale here. – R. Martinho Fernandes Dec 31 '15 at 00:05