Calculate complexity of algorithm and approach

Question

I have written a code to count group of 1's in binary matrix. Referring to my question link here

code

def groupcheck(i, j, matrix):
    if 0 <= i < len(matrix) and 0 <= j < len(matrix):
        if matrix[i][j]:
            matrix[i][j] = 0
            for dx, dy in ((-1, 0), (1, 0), (0, -1), (0, 1)):
                groupcheck(i + dx, j + dy, matrix)


def countGroup(matrix):
    count = 0;

    for i in range(len(matrix)):
        for j in range(len(matrix)):
            if matrix[i][j]:
                count += 1
                groupcheck(i, j, matrix)
    return count

matrix = [
    [1,1,0,0],
    [1,1,1,0],
    [0,1,1,0],
    [0,0,0,1] 
]

group = countGroup(matrix)
print(group)

Can someone please help me to calculate the complexity of this algorithm and what kind of approach it is? Also is there any better approach than this?

As per me complexity and approach(please correct me if I'm wrong):

complexity : O(n^2*4) (n is length of square matrix)
approach: brute force

I'm still learning please try to explain me if possible.

There is no meaning to a `*4` inside `O( )` as the definition of `O( )` specifically ignores constants. I don't know why you call this approach "brute-force"? Finally, you are expressing the complexity as a function of "`n`" but you haven't stated what "`n`" was. Is it the number of elements in the matrix? The length of the side of the square matrix? The number of groups of 1s? Note that the matrix doesn't need to be square for your algorithm to make sense (for a non-square matrix you just need to change the test `0 <= j < len(matrix)` because `len(matrix)` is the height, not the width) — Stef, Nov 01 '20 at 10:45
I am also a bit confused because this algorithms counts the number of connected components in the "drawing" of the matrix, whereas the other question you linked appeared to be about connected components in the graph whose adjacency matrix is this matrix. Although both problems are "counting the number of connected components", they use the matrix to represent different graphs, so the solution will be different. — Stef, Nov 01 '20 at 10:56
Here `n` stand for length of square matrix and here it is just square matrix. Talking about approach can you please help me what is this approach called as if not "brute-force". I'm still learning, it will be really very helpful for me. — Avanish Tiwari, Nov 01 '20 at 10:57
Consider for instance the matrix: `[[1,1,0,1],[1,1,0,0],[0,0,1,0],[1,0,0,1]]`. Are there two groups ({0,1,3}, {2}), or three groups ({00, 01, 10, 11}, {22}, {33})? — Stef, Nov 01 '20 at 10:58
Can you help me with correct solution then? because while I was trying to solve the linked question I got down to this approach and it gave me correct output. — Avanish Tiwari, Nov 01 '20 at 11:00
Are you sure there are three groups? This wouldn't be consistent with your other question. Note that there is a `1` in `matrix[0][3]`, so shouldn't `1` and `3` be in the same group? — Stef, Nov 01 '20 at 11:06
Sorry I missed last element there are two groups. I have updated the comment. — Avanish Tiwari, Nov 01 '20 at 11:08
Can you help me with correct solution and approach for this problem then? — Avanish Tiwari, Nov 01 '20 at 11:14

Stef · Accepted Answer · 2020-11-01T21:31:10.177

1

The problem you are trying to solve is called counting the connected-components of a graph.

However, which graph are we talking about? Is it:

the grid, where each cell of the square matrix is a node in the graph, adjacent to the adjacent cells;
the graph whose adjacency matrix is this square matrix?

Consider for instance the following matrix:

[[1,1,0,1],
 [1,1,0,0],
 [0,0,1,0],
 [1,0,0,1]]

Your algorithm counts 5 groups in this matrix. This is expected because there are visually five groups in the grid:

[[A,A,0,B],
 [A,A,0,0],
 [0,0,C,0],
 [D,0,0,E]]

However, this matrix is the adjacency matrix of the following graph:

0 - 1
|
3   2

Which, as you can see, only has two groups {0, 1, 3} and {2}.

How to fix it

As far as I can see, your algorithm works perfectly to count the number of connected components in the grid. But that is not what you are interested in. You are interested in the number of connected components in the graph represented by this adjacency matrix. You can keep your functions groupcheck and countGroup, whose logic is good, but you should modify them so that a node of the graph is given by just one index i rather than by a pair of indices (i,j); and so that two nodes i and j are considered adjacent by groupcheck if there is a 1 in matrix[i][j].

Your function groupcheck currently "erases" cells which have already be counted by setting their value to 0 with the line matrix[i][j] = 0.

I suggest replacing this by maintaining a set of unseen nodes.

def groupcheck(i, matrix, unseen):
  for j in range(len(matrix)):
    if (j in unseen) and matrix[i][j]:  # if i and j are adjacent
      unseen.discard(j)
      groupcheck(j, matrix, unseen)

def countGroup(matrix):
  count = 0
  unseen = set(range(len(matrix)))
  while unseen:
    i = unseen.pop()
    count += 1
    groupcheck(i, matrix, unseen)
  return count

Complexity analysis: the complexity of countGroup is n times the complexity of groupcheck. Unfortunately, groupcheck can make up to n recursive calls, and each recursive call contains a for-loop, so the complexity of groupcheck is O(n^2).

edited Nov 01 '20 at 21:31

answered Nov 01 '20 at 11:27

Stef

13,242
2
17
28

amazing, thank you for the solution. Can you explain me what this approach is called as? Correct me if I'm wrong as per me this is BST approach? and complexity is O(n^2). – Avanish Tiwari Nov 01 '20 at 11:37
Technically, if you inspect it carefully you'll find it's a depth-first traversal and not a breadth-first traversal. This is because of the recursive call `groupcheck(j, matrix, unseen)` which will explore neighbours of `j` before coming back to the neighbours of `i` in the next iteration of the `for`-loop. But truth is, the order of the traversal is irrelevant here; you could implement function `groupcheck` using breadth-first traversal instead, or even some kind or random-order traversal, and that wouldn't change the validity nor complexity of this algorithm. – Stef Nov 01 '20 at 12:49
PS: When done on a grid, like you did with your first version, it is called "flood-fill" because it really looks like water flooding adjacent cells (or by analogy with the floodfill tool on paintbrush). But on an abstract graph we don't really call it flood-fill. – Stef Nov 01 '20 at 12:50
Ok understood, Can you help like what exactly is your approach called as, because I want to learn them, also what is the complexity of your algorithm. – Avanish Tiwari Nov 01 '20 at 13:14
Well, hum, uh, the complexity would be `O(n^2)` if I had been careful when coding `groupcheck`, but it looks like it's actually `O(n^3)` because of the terrible way in which I implemented it :( – Stef Nov 01 '20 at 13:22
How come `O(n^3)`? there is one while loop for `n` and inside while loop there is one for loop `n-1`. Also if you want to optimise it what will you change to make it better. – Avanish Tiwari Nov 01 '20 at 13:30
`groupcheck` as written is `O(n^2)` because of the combination of `for`-loop with recursive calls. If it makes `n` recursive calls and each call contains a `for`-loop, that's `n^2` operations. `countGroup` has complexity `O(n) * complexity(groupcheck)`, thus `O(n^3)` with this implementation of `groupcheck`. `groupcheck` should be O(n) if written in a better way. – Stef Nov 01 '20 at 13:33
Cool understood, If you have time please help me what with optimise cod. Till then I will try to improve it. I hope approach is call as DFS. – Avanish Tiwari Nov 01 '20 at 13:37
I have edited my answer with a true O(n^2) algorithm. – Stef Nov 01 '20 at 13:47
Is the new approach still called as DSF or something else? – Avanish Tiwari Nov 01 '20 at 13:59
I still think it is `O(n^3)` because while loop for `O(n)` and inside while loop there is `any()` which is `O(n)` inside any there is for loop i.e O(n). – Avanish Tiwari Nov 01 '20 at 14:04
No, the `for`-loop "inside" the `any` is the same loop as the `any` loop. The whole `any` test is only O(n). – Stef Nov 01 '20 at 14:05
This new algorithm is neither a breadth-first traversal nor a depth-first traversal. It's simply iterating on the nodes in the order of their index, and for every node checking if they have at least one neighbour which has already been seen. – Stef Nov 01 '20 at 14:09
`brute force` then? – Avanish Tiwari Nov 01 '20 at 14:16
No? "Brute force" means something along the lines of "Enumerating all possibilities". We are not enumerating possibilities. We are counting. – Stef Nov 01 '20 at 14:23
If I have understood the problem correctly, your both solutions will fail for `[[1, 0 , 1], [0, 1, 1], [1, 1, 1]]`. Answer should be 1. While your solution 1 can be corrected, I am not entirely sure if what you are trying to do in solution 2 as it is only checking nodes at distance 1.` – รยקคгรђשค Nov 01 '20 at 14:54
1

@รยקคгรђשค Wow. That's terrible. I removed the second solution. I'll need to think about this with a clear head. – Stef Nov 01 '20 at 21:32

Calculate complexity of algorithm and approach

1 Answers1

How to fix it