9

I have randomly generated grid containing 0 and 1:

1 1 0 0 0 1 0 1
1 1 1 0 1 1 1 1
1 0 0 0 1 0 1 1
0 0 1 0 1 0 1 1
1 1 1 1 0 0 1 1
0 0 1 1 1 1 1 0
0 1 0 0 1 0 1 1

How can I iterate through the grid to find the largest cluster of 1s, that is equal or larger than 4 items (across row and column)?

I assume I need to keep a count of each found cluster while iterating and ones its more than 4 items, record and count in a list and then find the largest number.

The problem is that I cannot figure out how to do so across both rows and columns and record the count. I have iterated through the grid but not sure how to move further than two rows.

For example in the above example, the largest cluster is 8. There are some other clusters in the grid, but they have 4 elements:

A A 0 0 0 1 0 1
A A 1 0 1 1 1 1
1 0 0 0 1 0 1 1
0 0 1 0 1 0 1 1
1 1 B B 0 0 1 1
0 0 B B 1 1 1 0
0 1 0 0 1 0 1 1

The code I tried:

rectcount = []
for row in range(len(grid)):
    for num in range(len(grid[row])):

    # count = 0
        try:

            # if grid[row][num] == 1:
                # if grid[row][num] == grid[row][num + 1] == grid[row + 1][num] == grid[row + 1][num + 1]:
                    # count += 1

            if grid[row][num] == grid[row][num + 1]:
                if grid[row + 1][num] == grid[row][num + 1]:
                    count += 1

                # if grid[row][num] == grid[row][num + 1] and grid[row][num] == grid[row + 1][num]:
                    # count += 1
                else:
                    count = 0

            if grid[row][num] == grid[row + 1][num]:
                count += 1
        except:
            pass
W Szum
  • 171
  • 5
  • if you have a single 1 beside a cluster, isn't it considered part of the cluster? In other words, are the clusters necessarily rectangular-shaped? – Pablo C Oct 30 '20 at 06:52
  • Must each cluster span a minimum of two rows and a minimum of two columns? In your example, in the 5 th row from the top, there is a sequence of 4 consecutive `1`s, all in the same row. Is that to be considered as a cluster (even though it spans only one row)? – fountainhead Oct 30 '20 at 11:42
  • As the boundaries of the clusters get identified, are the clusters allowed to have partial overlaps? Or, should the boundaries be fixed in such a way that there are no overlaps (common `1`s) at all between any two clusters? – fountainhead Oct 30 '20 at 17:05
  • @PabloC the clusters can also be diagonal but only moved left/right by one value. I will add another example to the question. – W Szum Oct 31 '20 at 09:30
  • I still don't get it, can you define what a cluster is and what should be considered as a neighbor? – Pablo C Oct 31 '20 at 09:46
  • @PabloC Sorry for my poor explanation. You can think of a cluster as a rectangle across rows and columns that made up of 1s. To be considered as one, it should contain at least two consecutive 1s in a row and at least one row below with the same number of consecutive 1s. – W Szum Oct 31 '20 at 10:12
  • As examples, you have highlighted some of the `1` s in bold. That doesn't tell us anything about what the individual clusters are, how many clusters there are, or which of the highlighted `1` s belong to which cluster. And your comments are only adding to the confusion. Please figure out a better way provide the details I've just mentioned. – fountainhead Oct 31 '20 at 12:17
  • Perhaps try color coding the `1` s according to the cluster to which they belong. Or, better still, use the letter `A` in the place of all the `1` s that belong to a particular cluster, and then the letter `B` for a different cluster, and so on. Any `1` that does not belong to any cluster will remain as `1`. That will help us understand your examples better – fountainhead Oct 31 '20 at 12:20
  • @fountainhead I edited the question, please check. Not sure if it will be clearer now. – W Szum Oct 31 '20 at 20:58
  • @WSzum Can you also tell what is the largest possible grid size (num rows and cols)? And what is the time limit to solve the task for such largest grid? – Arty Oct 31 '20 at 21:11
  • @Arty the grid is always the same size, 10x10. My example shows 8x8 just as an example about how clusters are defined. – W Szum Oct 31 '20 at 22:15
  • It's enough to know the size of the largest cluster or you also want to identify its position? – Pablo C Nov 01 '20 at 06:34

1 Answers1

2

I've implemented three algorithms.

First algorithm is Simple, using easiest approach of nested loops, it has O(N^5) time complexity (where N is one side of input grid, 10 for our case), for our inputs of size 10x10 time of O(10^5) is quite alright. Algo id in code is algo = 0. If you just want to see this algorithm jump to line ------ Simple Algorithm inside code.

Second algorithm is Advanced, using Dynamic Programming approach, its complexity is O(N^3) which is much faster than first algorithm. Algo id in code is algo = 1. Jump to line ------- Advanced Algorithm inside code.

Third algorithm Simple-ListComp I implemented just for fun, it is almost same like Simple, same O(N^5) complexity, but using Python's list comprehensions instead of regular loops, that's why it is shorter, also a bit slower because doesn't use some optimizations. Algo id in code is algo = 2. Jump to line ------- Simple-ListComp Algorithm inside code to see algo.

The rest of code, besides algorithms, implements checking correctness of results (double-checking between algorithms), printing results, producing text inputs. Code is split into solving-task function solve() and testing function test(). solve() function has many arguments to allow configuring behavior of function.

All main code lines are documented by comments, read them to learn how to use code. Basically if s variable contains multi-line text with grid elements, same like in your question, you just run solve(s, text = True) and it will solve task and print results. Also you may choose algorithm out of two versions (0 (Simple) and 1 (Advanced) and 2 (Simple-ListComp)) by giving next arguments to solve function algo = 0, check = False (here 0 for algo 0). Look at test() function body to see simplest example of usage.

Algorithms output to console by default all clusters, from largest to smallest, largest is signified by . symbol, the rest by B, C, D, ..., Z symbols. You may set argument show_non_max = False in solve function if you want only first (largest) cluster to be shown.

I'll explain Simple algorithm:

  1. Basically what algorithm does - it searches through all possible angled 1s rectangles and stores info about maximal of them into ma 2D array. Top-left point of such rectangle is (i, j), top-right - (i, k), bottom-left - (l, j + angle_offset), bottom-right - (l, k + angle_offset), all 4 corners, that's why we have so many loops.
  2. In outer two i (row) , j (column) loops we iterate over whole grid, this (i, j) position will be top-left point of 1s rectangle, we need to iterate whole grid because all possible 1s rectangles may have top-left at any (row, col) point of whole grid. At start of j loop we check that grid at (i, j) position should always contain 1 because inside loops we search for all rectangle with 1s only.
  3. k loop iterates through all possible top-right positions (i, k) of 1s rectangle. We should break out of loop if (i, k) equals to 0 because there is no point to extend k further to right because such rectangle will always contain 0.
  4. In previous loops we fixed top-left and top-right corners of rectangle. Now we need to search for two bottom corners. For that we need to extend rectangle downwards at different angles till we reach first 0.
  5. off loop tries extending rectangle downwards at all possible angles (0 (straight vertical), +1 (45 degrees shifted to the right from top to bottom), -1 (-45 degrees)), off basically is such number that grid[y][x] is "above" (corresponds to by Y) grid[y + 1][x + off].
  6. l tries to extend rectangle downwards (in Y direction) at different angles off. It is extended till first 0 because it can't be extended further then (because each such rectangle will already contain 0).
  7. Inside l loop there is if grid[l][max(0, j + off * (l - i)) : min(k + 1 + off * (l - i), c)] != ones[:k - j + 1]: condition, basically this if is meant to check that last row of rectangle contains all 1 if not this if breaks out of loop. This condition compares two list slices for non-equality. Last row of rectangle spans from point (l, j + angle_offset) (expression max(0, j + off * (l - i)), max-limited to be 0 <= X) to point (l, k + angle_offset) (expression min(k + 1 + off * (l - i), c), min-limited to be X < c).
  8. Inside l loop there are other lines, ry, rx = l, k + off * (l - i) computes bottom-right point of rectangle (ry, rx) which is (l, k + angle_offset), this (ry, rx) position is used to store found maximum inside ma array, this array stores all maximal found rectangles, ma[ry][rx] contains info about rectangle that has bottom-right at point (ry, rx).
  9. rv = (l + 1 - i, k + 1 - j, off) line computes new possible candidate for ma[ry][rx] array entry, possible because ma[ry][rx] is updated only if new candidate has larger area of 1s. Here rv[0] value inside rv tuple contains height of such rectangle, rv[1] contains width of such rectangle (width equals to the length of bottom row of rectangle), rv[2] contains angle of such rectangle.
  10. Condition if rv[0] * rv[1] > ma[ry][rx][0] * ma[ry][rx][1]: and its body just checks if rv area is larger than current maximum inside array ma[ry][rx] and if it is larger then this array entry is updated (ma[ry][rx] = rv). I'll remind that ma[ry][rx] contains info (width, height, angle) about current found maximal-area rectangle that has bottom-right point at (ry, rx) and that has these width, height and angle.
  11. Done! After algorithm run array ma contains information about all maximal-area angled rectangles (clusters) of 1s so that all clusters can be restored and printed later to console. Largest of all such 1s-clusters is equal to some rv0 = ma[ry0][rx0], just iterate once through all elements of ma and find such point (ry0, rx0) so that ma[ry0][rx0][0] * ma[ry0][rx0][1] (area) is maximal. Then largest cluster will have bottom-right point (ry0, rx0), bottom-left point (ry0, rx0 - rv0[1] + 1), top-right point (ry0 - rv0[0] + 1, rx0 - rv0[2] * (rv0[0] - 1)), top-left point (ry0 - rv0[0] + 1, rx0 - rv0[1] + 1 - rv0[2] * (rv0[0] - 1)) (here rv0[2] * (rv0[0] - 1) is just angle offset, i.e. how much shifted is first row along X compared to last row of rectangle).

Try it online!

# ----------------- Main function solving task -----------------

def solve(
    grid, *,
    algo = 1, # Choose algorithm, 0 - Simple, 1 - Advanced, 2 - Simple-ListComp
    check = True, # If True run all algorithms and check that they produce same results, otherwise run just chosen algorithm without checking
    text = False, # If true then grid is a multi-line text (string) having grid elements separated by spaces
    print_ = True, # Print results to console
    show_non_max = True, # When printing if to show all clusters, not just largest, as B, C, D, E... (chars from "cchars")
    cchars = ['.'] + [chr(ii) for ii in range(ord('B'), ord('Z') + 1)], # Clusters-chars, these chars are used to show clusters from largest to smallest
    one = None, # Value of "one" inside grid array, e.g. if you have grid with chars then one may be equal to "1" string. Defaults to 1 (for non-text) or "1" (for text).
    offs = [0, +1, -1], # All offsets (angles) that need to be checked, "off" is such that grid[i + 1][j + off] corresponds to next row of grid[i][j]
    debug = False, # If True, extra debug info is printed
):
    # Preparing
    
    assert algo in [0, 1, 2], algo
    if text:
        grid = [l.strip().split() for l in grid.splitlines() if l.strip()]
    if one is None:
        one = 1 if not text else '1'
    r, c = len(grid), len(grid[0])
    sgrid = '\n'.join([''.join([str(grid[ii][jj]) for jj in range(c)]) for ii in range(r)])
    mas, ones = [], [one] * max(c, r)
    
    # ----------------- Simple Algorithm, O(N^5) Complexity -----------------
        
    if algo == 0 or check:
        ma = [[(0, 0, 0) for jj in range(c)] for ii in range(r)] # Array containing maximal answers, Lower-Right corners
        
        for i in range(r):
            for j in range(c):
                if grid[i][j] != one:
                    continue
                for k in range(j + 1, c): # Ensure at least 2 ones along X
                    if grid[i][k] != one:
                        break
                    for off in offs:
                        for l in range(i + 1, r): # Ensure at least 2 ones along Y
                            if grid[l][max(0, j + off * (l - i)) : min(k + 1 + off * (l - i), c)] != ones[:k - j + 1]:
                                l -= 1
                                break
                            ry, rx = l, k + off * (l - i)
                            rv = (l + 1 - i, k + 1 - j, off)
                            if rv[0] * rv[1] > ma[ry][rx][0] * ma[ry][rx][1]:
                                ma[ry][rx] = rv
                                
        mas.append(ma)
        ma = None
                    
    # ----------------- Advanced Algorithm using Dynamic Programming, O(N^3) Complexity -----------------

    if algo == 1 or check:
        ma = [[(0, 0, 0) for jj in range(c)] for ii in range(r)] # Array containing maximal answers, Lower-Right corners
        
        for off in offs:
            d = [[(0, 0, 0) for jj in range(c)] for ii in range(c)]
            for i in range(r):
                f, d_ = 0, [[(0, 0, 0) for jj in range(c)] for ii in range(c)]
                for j in range(c):
                    if grid[i][j] != one:
                        f = j + 1
                        continue
                    if f >= j:
                        # Check that we have at least 2 ones along X
                        continue
                    df = [(0, 0, 0) for ii in range(c)]
                    for k in range(j, -1, -1):
                        t0 = d[j - off][max(0, k - off)] if 0 <= j - off < c and k - off < c else (0, 0, 0)
                        if k >= f:
                            t1 = (t0[0] + 1, t0[1], off) if t0 != (0, 0, 0) else (0, 0, 0)
                            t2 = (1, j - k + 1, off)
                            t0 = t1 if t1[0] * t1[1] >= t2[0] * t2[1] else t2
                            
                            # Ensure that we have at least 2 ones along Y
                            t3 = t1 if t1[0] > 1 else (0, 0, 0)
                            if k < j and t3[0] * t3[1] < df[k + 1][0] * df[k + 1][1]:
                                t3 = df[k + 1]
                            df[k] = t3
                        else:
                            t0 = d_[j][k + 1]
                        if k < j and t0[0] * t0[1] < d_[j][k + 1][0] * d_[j][k + 1][1]:
                            t0 = d_[j][k + 1]
                        d_[j][k] = t0
                    if ma[i][j][0] * ma[i][j][1] < df[f][0] * df[f][1]:
                        ma[i][j] = df[f]
                d = d_
                
        mas.append(ma)
        ma = None
        
    # ----------------- Simple-ListComp Algorithm using List Comprehension, O(N^5) Complexity -----------------
        
    if algo == 2 or check:
        ma = [
            [
                max([(0, 0, 0)] + [
                    (h, w, off)
                    for h in range(2, i + 2)
                        for w in range(2, j + 2)
                            for off in offs
                    if all(
                        cr[
                            max(0, j + 1 - w - off * (h - 1 - icr)) :
                            max(0, j + 1 - off * (h - 1 - icr))
                        ] == ones[:w]
                        for icr, cr in enumerate(grid[max(0, i + 1 - h) : i + 1])
                    )
                ], key = lambda e: e[0] * e[1])
                for j in range(c)
            ]
            for i in range(r)
        ]
        mas.append(ma)
        ma = None
    
    # ----------------- Checking Correctness and Printing Results -----------------

    if check:
        # Check that we have same answers for all algorithms
        masx = [[[cma[ii][jj][0] * cma[ii][jj][1] for jj in range(c)] for ii in range(r)] for cma in mas]
        assert all([masx[0] == e for e in masx[1:]]), 'Maximums of algorithms differ!\n\n' + sgrid + '\n\n' + (
            '\n\n'.join(['\n'.join([' '.join([str(e1).rjust(2) for e1 in e0]) for e0 in cma]) for cma in masx])
        )

    ma = mas[0 if not check else algo]

    if print_:
        cchars = ['.'] + [chr(ii) for ii in range(ord('B'), ord('Z') + 1)] # These chars are used to show clusters from largest to smallest
        res = [[grid[ii][jj] for jj in range(c)] for ii in range(r)]
        mac = [[ma[ii][jj] for jj in range(c)] for ii in range(r)]
        processed = set()
        sid = 0
        for it in range(r * c):
            sma = sorted(
                [(mac[ii][jj] or (0, 0, 0)) + (ii, jj) for ii in range(r) for jj in range(c) if (ii, jj) not in processed],
                key = lambda e: e[0] * e[1], reverse = True
            )
            if len(sma) == 0 or sma[0][0] * sma[0][1] <= 0:
                break
            maxv = sma[0]
            if it == 0:
                maxvf = maxv
            processed.add((maxv[3], maxv[4]))
            show = True
            for trial in [True, False]:
                for i in range(maxv[3] - maxv[0] + 1, maxv[3] + 1):
                    for j in range(maxv[4] - maxv[1] + 1 - (maxv[3] - i) * maxv[2], maxv[4] + 1 - (maxv[3] - i) * maxv[2]):
                        if trial:
                            if mac[i][j] is None:
                                show = False
                                break
                        elif show:
                            res[i][j] = cchars[sid]
                            mac[i][j] = None
            if show:
                sid += 1
            if not show_non_max and it == 0:
                break
        res = '\n'.join([''.join([str(res[ii][jj]) for jj in range(c)]) for ii in range(r)])
        print(
            'Max:\nArea: ', maxvf[0] * maxvf[1], '\nSize Row,Col: ', (maxvf[0], maxvf[1]),
            '\nLowerRight Row,Col: ', (maxvf[3], maxvf[4]), '\nAngle: ', ("-1", " 0", "+1")[maxvf[2] + 1], '\n', sep = ''
        )
        print(res)
        if debug:
            # Print all computed maximums, for debug purposes
            for cma in [ma, mac]:
                print('\n' + '\n'.join([' '.join([f'({e0[0]}, {e0[1]}, {("-1", " 0", "+1")[e0[2] + 1]})' for e0_ in e for e0 in (e0_ or ('-', '-', 0),)]) for e in cma]))
        print(end = '-' * 28 + '\n')
    
    return ma

# ----------------- Testing -----------------

def test():
    # Iterating over text inputs or other ways of producing inputs
    for s in [
        """
        1 1 0 0 0 1 0 1
        1 1 1 0 1 1 1 1
        1 0 0 0 1 0 1 1
        0 0 1 0 1 0 1 1
        1 1 1 1 0 0 1 1
        0 0 1 1 1 1 1 0
        0 1 0 0 1 0 1 1
        """,
        """
        1 0 1 1 0 1 0 0
        0 1 1 0 1 0 0 1
        1 1 0 0 0 0 0 1
        0 1 1 1 0 1 0 1
        0 1 1 1 1 0 1 1
        1 1 0 0 0 1 0 0
        0 1 1 1 0 1 0 1
        """,
        """
        0 1 1 0 1 0 1 1
        0 0 1 1 0 0 0 1
        0 0 0 1 1 0 1 0
        1 1 0 0 1 1 1 0
        0 1 1 0 0 1 1 0
        0 0 1 0 1 0 1 1
        1 0 0 1 0 0 0 0
        0 1 1 0 1 1 0 0
        """
    ]:
        solve(s, text = True)

if __name__ == '__main__':
    test()

Output:

Max:
Area: 8
Size Row,Col: (4, 2)
LowerRight Row,Col: (4, 7)
Angle:  0

CC000101
CC1011..
100010..
001010..
1BBB00..
00BBBDD0
010010DD
----------------------------
Max:
Area: 6
Size Row,Col: (3, 2)
LowerRight Row,Col: (2, 1)
Angle: -1

10..0100
0..01001
..000001
0BBB0101
0BBB1011
CC000100
0CC10101
----------------------------
Max:
Area: 12
Size Row,Col: (6, 2)
LowerRight Row,Col: (5, 7)
Angle: +1

0..01011
00..0001
000..010
BB00..10
0BB00..0
001010..
10010000
01101100
----------------------------
Arty
  • 14,883
  • 6
  • 36
  • 69
  • thank you! Can you please explain the following part in your simple algorithm: ry, rx = l, k + off * (l - i)... up until the append. Another one: if grid[l][max(0, j + off * (l - i)) : min(k + 1 + off * (l - i), c)] != ones[:k - j + 1]... checkers the angled rectangles, right? – W Szum Nov 01 '20 at 21:21
  • @WSzum I just wrote explanation of whole `Simple` algorithm in paragraphs `1.-11.` right before Python code. In order to answer your questions in your comment above read just paragraphs `7.-10.`. Although all explanation is quite long, I just wanted to describe all details, algorithm code itself is much shorter that its explanation. So I think Simple algirithm is almost simplest possible. – Arty Nov 02 '20 at 05:31
  • @WSzum BTW, I've just updated my answer by adding one more algorithm `Simple-ListComp`, it is almost same like `Simple`, same algorithm and complexity, but expressed as python's list comprehension instead of regular loops. You may put a look at it! – Arty Nov 02 '20 at 07:19
  • @WSzum Also by default I run all algorithms to check that all of them have same output, if you need to run always just `Simple` algorithm for economy of time you should pass `algo = 0, check = False` arguments to `solve()` function. Also if Simple algorithm is too slow and if in your case you're allowed to use [NumPy](https://numpy.org/) arrays instead of plain python's list, you may convert lists to numpy arrays and it will improve your speed by around `10x` times more. If you'll need to speed up your code tell me I'll write variant of algorithm with `NumPy` arrays. – Arty Nov 02 '20 at 07:35