Which data structure to be applied?

Question

there is a matrix of size n*n where n<=500000. Initially all elements are 0. We have to update an entire row or column by a certain number every time there is an input

example:

n=3    
RS 1 10

means that we have to update row 1 by 10

0 0 0
0 0 0
0 0 0

after update

10 10 10
 0  0  0    
 0  0  0

same we have to do for column. In the end we have to count the number of 0's in the matrix

as n is very big double dimension array cannot be applied. So which data structure to apply?

[Not identical, but related](http://stackoverflow.com/questions/14695582/range-update-and-querying-in-a-2d-matrix). — Bernhard Barker, Feb 05 '13 at 11:52
Does the update mean +N or set to N? Can a row/column be reset to zero? — Aki Suihkonen, Feb 05 '13 at 12:34

score 4 · Answer 1 · edited Feb 05 '13 at 12:03

4

Well this is interesting, it would ofcourse depend on the number of operations you are going to perform but I would save it as 2 single dimension arrays. One with the row inputs and the other with the column inputs.

row[n] and col[n]

So the when you want to know the value of say element (4,7) it would be row[4] + col[7]

edited Feb 05 '13 at 12:03

Shai

111,146
38
238
371

answered Feb 05 '13 at 11:50

Techmonk

1,459
12
20

Shai · Answer 2 · 2013-02-05T13:09:07.907

Taking @Techmonk's answer a bit further: I propose two approaches:

1. Techmonk's

O(1) for updates, O(n^2) for recovering the number of 0`s

 class matZeroCount {
     std::vector< int > m_rows;
     std::vector< int > m_cols;
 public:
     matZeroCount( unsigned int n ): m_rows( n, 0 ), m_cols( n, 0 ) {};
     void updateRow( unsigned int idx, int update ) { 
          // check idx range w.r.t m_rows.size()
          // ignore update == 0 case
          m_rows[ idx ] += update; 
     }
     void updateCol( unsigned int idx, int update ) { 
          // check idx range w.r.t m_cols.size()
          // ignore update == 0 case
          m_cols[ idx ] += update; 
     }
     unsigned int countZeros() const {
         unsigned int count = 0;
         for ( auto ir = m_rows.begin(); ir != m_rows.end(); ir++ ) {
             for ( auto ic = m_cols.begin(); ic != m_cols.end(); ic++ ) {
                  count += ( ( *ir + * ic ) == 0 );
             }
         }
         return count;
     }
 };

2. Fast count

This method allows for O(1) for recovering number of zeros, at the cost of O(n) for each update. If you expect less than O(n) updates - this approach might be more efficient.

 class matZeroCount {
     std::vector< int > m_rows;
     std::vector< int > m_cols;
     unsigned int       m_count;
 public:
     matZeroCount( unsigned int n ): m_rows( n, 0 ), m_cols( n, 0 ), count(0) {};
     void updateRow( unsigned int idx, int update ) { 
          // check idx range w.r.t m_rows.size()
          // ignore update == 0 case
          m_rows[ idx ] += update;
          for ( auto ic = m_cols.begin(); ic != m_cols.end(); ic++ ) {
               m_count += ( ( m_rows[ idx ] + *ic ) == 0 ); // new zeros
               m_count -= ( ( m_rows[ idx ] - update + *ic ) == 0 ); // not zeros anymore
          }
     }
     void updateCol( unsigned int idx, int update ) { 
          // check idx range w.r.t m_cols.size()
          // ignore update == 0 case
          m_cols[ idx ] += update; 
          for ( auto ir = m_rowss.begin(); ir != m_rows.end(); ir++ ) {
               m_count += ( ( m_cols[ idx ] + *ir ) == 0 ); // new zeros
               m_count -= ( ( m_cols[ idx ] - update + *ir ) == 0 ); // not zeros anymore
          }

     }
     unsigned int countZeros() const { return m_count; };
 };

You can calculate the number of Zeros in O(n) in the first case as well, the answer should be = NumZeros(row)*NumZeros(col); — Techmonk, Feb 05 '13 at 14:09
@Techmonk - I'm not sure I'm following your suggestion. If I assume `update` can be negative (as well as positive), don't I have to explicitly go through all `n^2` options? — Shai, Feb 05 '13 at 14:13
Since in the end the total only matters and we are storing the total for that row, it should not matter e.g. imagine a 3x3 array we get for rows 1 1, 2 2, 2 3, 1 -2, 2 -2, 3 -2. then we have our array as -1, 0, 1 imagine same for column giving us -1, 0, 1 and thus the total number of zeros would be count(row)*count(col) = 1 — Techmonk, Feb 05 '13 at 14:57
@Techmonk (1) I guess you meant 1 1, 2 2, **3** 3, 1 -2, 2 -2, 3 -2. (2) For your example, wouldn't the resulting matrix be [-2 -1 0;-1 0 1;0 1 2] with total count = 3? — Shai, Feb 05 '13 at 15:12
Yes, you are right I didn't look at negative cases properly.. it would be n^2 — Techmonk, Feb 05 '13 at 16:41
@Techmonk - anyhow, thank you for your answer and comments - I really enjoyed them! — Shai, Feb 05 '13 at 16:46

Adi · Answer 3 · 2013-04-13T18:18:37.370

3

Sparse Matrix is a data structure appropriate for matrices populated mostly with zeros. Its implementation is oriented towards to space efficiency. It is appropriate for cases just like yours, when you have large matrices with very little information.

edited Apr 13 '13 at 18:18

answered Feb 05 '13 at 12:44

Adi

1,296
10
13

score 1 · Answer 4 · answered Feb 05 '13 at 11:51

You might need a user defined type that internally contains a std::list <std::list<int>>.

But really, can you hold 250000000000 integers in the memory at the same time? I doubt it!

You might need to use a much different, file-to-memory mapped data structure of two dimensional array of integers.

Which data structure to be applied?

4 Answers4

1. Techmonk's

2. Fast count