4

I have a problem with describing algorithm for finding maximum rectangular area of binary data, where 1 occurs k-times more often than 0. Data is always n^2 bits like this:

For example data for n = 4 looks like:

1 0 1 0
0 0 1 1
0 1 1 1
1 1 0 1

Value of k can be 1 .. j (k = 1 means, that number of 0 and 1 is equal).

For above example of data and for k = 1 solution is:

1 0 1 0 <- 4 x '0' and 4 x '1'
0 0 1 1
0 1 1 1
1 1 0 1

But in this example:
1 1 1 0
0 1 0 0
0 0 0 0
0 1 1 1

Solution would be:
1 1 1 0
0 1 0 0
0 0 0 0
0 1 1 1

I tried with few brute force algorithms but for n > 20 it is getting too slow. Can you advise me how I should solve this problem?


As RBerteig proposed - the problem can be also described like that: "In a given square bitmap with cells set to 1 or 0 by some arbitrary process, find the largest rectangular area where the 1's and 0's occur in a specified ratio, k."

user445783
  • 41
  • 3
  • I'm not sure I understand the question. What is it that the algorithm is supposed to do? – casablanca Sep 12 '10 at 21:31
  • Algorithm should found a rectangular sector in given data (n x n bits) in which '1' occurs k-times more often than '0'. For k = 1, algorithm should found such fragment of data, that '1' occurs same times that '0'. If the data was 1 row I would use prefix sum, but in this case I think I'm unable to do that. – user445783 Sep 12 '10 at 21:38
  • Is this a correct restatement of the problem? "In a given square bitmap with cells set to 1 or 0 by some arbitrary process, find the largest rectangular area where the 1's and 0's occur in a specified ratio, `k`." – RBerteig Sep 12 '10 at 21:39
  • @RBerteig sound pretty good, is that a well known problem? – user445783 Sep 12 '10 at 21:59
  • @alg0, It doesn't ring any bells offhand, I was just trying to find a clearer way to state it in the hopes that someone does recognize it. – RBerteig Sep 12 '10 at 22:01
  • @alg0, out of shear curiosity, what is the application domain? – RBerteig Sep 12 '10 at 22:09
  • 4
    For the last example given above, why is the rectangle consisting of the complete three right-hand columns not 'the solution'? – Will A Sep 12 '10 at 22:15
  • @RBerteig, this problem is just a puzzle I wanted to solve it to practice my algorithm skills (the idea was born after solving few problems from http://projecteuler.net/ ) – user445783 Sep 12 '10 at 22:23
  • @Will A, because that three right columns is the biggest area where '1' occurs k (1 in this case) times '0' – user445783 Sep 12 '10 at 22:26
  • @alg0, its clearly an interesting puzzle... – RBerteig Sep 12 '10 at 22:26
  • 6 1's and 6 0's - a 1:1 ratio - this makes it (the rectangle consisting of the right three columns) the solution, surely? – Will A Sep 12 '10 at 22:27
  • @Will A, you are right, sorry for that. – user445783 Sep 12 '10 at 22:30
  • 1
    Worth noting (although very obvious) that for ratio to be k, the area of the rectangle must be a multiple of (k + 1) - this'll eliminate a lot of rectangles. – Will A Sep 12 '10 at 23:55

3 Answers3

3

Bruteforce should do just fine here for n < 100, if properly implemented: solution below has O(n^4) time and O(n^2) memory complexity. 10^8 operations should be well under 1 second on modern PC (especially considering that each operation is very cheap: few additions and subtractions).

Some observations

  1. There're O(n^4) sub-rectangles to consider and each of them can be a solution.
  2. If we can find number of 1's and 0's in each sub-rectangle in O(1) (constant time), we'll solve problem in O(n^4) time.
  3. If we know number of 1's in some sub-rectangle, we can find number of zeroes (through area).

So, the problem is reduced to following: create data structure allowing to find number of 1's in each sub-rectangle in constant time.

Now, imagine we have sub-rectangle [i0..i1]x[j0..j1]. I.e., it occupies rows between i0 and i1 and columns between j0 and j1. And let count_ones be the function to count number of 1's in subrectangle.

This is the main observation:

count_ones([i0..i1]x[j0..j1]) = count_ones([0..i1]x[0..j1]) - count_ones([0..i0 - 1]x[0..j1]) - count_ones([0..i1]x[0..j0 - 1]) + count_ones([0..i0 - 1]x[0..j0 - 1])

Same observation with practical example:

AAAABBB
AAAABBB
CCCCDDD
CCCCDDD
CCCCDDD
CCCCDDD

If we need to find number of 1's in D sub-rectangle (3x4), we can do it by taking number of 1's in the whole rectangle (A + B + C + D), subtracting number of 1's in (A + B) rectangle, subtracting number of 1's in (A + C) rectangle, and adding number of 1's in (A) rectangle. (A + B + C + D) - (A + B) - (A + C) + (A) = D

Thus, we need table sums, for each i and j containing number of 1's in sub-rectangle [0..i][0..j].
You can create this table in O(n^2), but even the direct way to fill it (for each i and j iterate all elements of [0..i][0..j] area) will be O(n^4).

Having this table,

count_ones([i0..i1]x[j0..j1]) = sums[i1][j1] - sums[i0 - 1][j1] - sums[i1][j0 - 1] + sums[i0 - 1][j0 - 1]

Therefore, time complexity O(n^4) reached.

Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181
2

This is still brute force, but something you should note is that you don't have to recompute everything from scratch for a new i*j rectangle. Instead, for each possible rectangle size, you can move the rectangle across the n*n grid one step at a time, decrementing the counts for the bits no longer within the rectangle and incrementing the counts for the bits that newly entered the rectangle. You could potentially combine this with varying the rectangle size, and try to find an optimal pattern for moving and resizing the rectangle.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • I was thinking about that solution but when I tried to estimate the number of steps I found it will be too complicated. For n = 200 (40,000 bits) there is 30,000 different sized "windows" which should be checked: len([(x, y) for x in range(1, 201) for y in range(1, 201) if not (x * y) % 2]) -- using Python to calculate that. BUT - each window show be moved across the n*n. This means that rectangle 2x1 should be moved (200 - 1) * (200 - 2) = 39402 times, 3x1 - 39203 ... etc. For n = 200, there will be ~300,000,000 moves. – user445783 Sep 12 '10 at 21:50
  • You could also cache the totals for each sub-rectangle visited, and use them for building aggregated totals of larger sub rectangles. – RBerteig Sep 12 '10 at 21:51
  • @RBerteig: For data: 0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 1 1 1 1 | How could I use mentioned cache? – user445783 Sep 12 '10 at 21:54
  • Once you've summed the 2x2 at the upper left and got 0, remember that fact and the total for its neighboring 2x2 when calculating the sum for the 2x4 at the top. I think the resulting pyramid of resolutions trades off memory for revisiting the same pixel many times. – RBerteig Sep 12 '10 at 22:04
0

Just some hints..

You could impose better restrictions on the values. The requirement leads to condition

N1*(k+1) == S*k, where N1 is number of ones in an area, and S=dx*dy is its surface. It can be rewritten in better form:

N1/k == S/(k+1).

Because the greatest common divisor of numbers n and n+1 is always 1, then N1 have to be multiple of k and dx*dy to be multiple of k+1. It reduces greatly the possible space of solutions, the larger is k, the better (for dx*dy case you'll need to play with prime divisors of k+1).

Now, because you need just the surface of the largest area with such property, it would be wise to start from largest areas and move to smaller ones. By trying dx*dy from n^2 downto k+1 that would satisfy the divisor and the bounding conditions, you'll find quite fast the solution, muuuch faster than O(n^4), because of a special reason: except cases when the array was specially constructed, if we assume a random input, the probability that there are N1 ones out of S values in the (n-dx+1)*(n-dy+1) areas that have the surface S will constantly grow with decrease of S. (large values of k will make the probability smaller, but in the same time they will make the filter for dx and dy pairs stronger).

Also, this problem: http://ioinformatics.org/locations/ioi99/contest/land/land.shtml , looks somehow similar, maybe you'll find some ideas in their solution.

ruslik
  • 14,714
  • 1
  • 39
  • 40