1

I have a python numpy matrix with 1´s and 0´s, I need to identify the largest "collection" of 1's in the matrix: https://i.stack.imgur.com/DbxOq.jpg

The matrix can have up to 960.000 elements so I would like to avoid a brute-force solution.

What is the smartest way to go about solving this problem?

NicolaiF
  • 1,283
  • 1
  • 20
  • 44
  • Have you given an attempt at this already? How about sharing what you have so far? – Jake Griffin Apr 23 '15 at 19:13
  • I have an implementation where I divide the matrix into sub matrices, then I take the sum of these sub matrices and find the largest one, I do this twice and then I brute force the solution by checking adjacent fields and adding the component to a cluster – NicolaiF Apr 23 '15 at 19:15
  • If you don't need your own implementation (if it's a homework assignment for example), check out NetworkX's python module (https://networkx.github.io/) and the related question here: http://stackoverflow.com/questions/24374627/networkx-largest-component-no-longer-working – Jake Griffin Apr 23 '15 at 19:20

2 Answers2

3

You can use a data structure called disjoint-set (here is a python implementation). This data structure was designed for this kind of task.

You iterate over the rows if the current element is 1, check if any of the already traversed neighbors are 1. If so add this element to its set. If there are more than 1 union those sets. If no neighbors are 1 create a new set. At the end output the largest set.

This would work as follows:

def MakeSet(x):
  x.parent = x
  x.rank   = 0
  x.size = 1

def Union(x, y):
  xRoot = Find(x)
  yRoot = Find(y)
  if xRoot.rank > yRoot.rank:
    yRoot.parent = xRoot
  elif xRoot.rank < yRoot.rank:
    xRoot.parent = yRoot
  elif xRoot != yRoot: # Unless x and y are already in same set, merge them
    yRoot.parent = xRoot
    xRoot.rank = xRoot.rank + 1
  x.size += y.size
  y.size = x.size

def Find(x):
  if x.parent == x:
    return x
  else:
    x.parent = Find(x.parent)
    return x.parent

""""""""""""""""""""""""""""""""""""""""""

class Node:
  def __init__ (self, label):
    self.label = label
  def __str__(self):
    return self.label

rows = [[1, 0, 0], [1, 1, 0], [1, 0, 0]]
setDict = {}
for i, row in enumerate(rows):
  for j, val in enumerate(row):
    if row[j] == 0:
      continue
    node = Node((i, j))
    MakeSet(node)
    if i > 0:
      if rows[i-1][j] == 1:
        disjointSet = setDict[(i-1, j)]
        Union(disjointSet, node)
    if j > 0:
      if row[j-1] == 1:
      disjointSet = setDict[(i, j-1)]
      Union(disjointSet, node)
    setDict[(i, j)] = node
print max([l.size for l in setDict.values()])

>> 4

This is a full working example with code for disjoint set taken from the link above.

Benjy Kessler
  • 7,356
  • 6
  • 41
  • 69
  • Actually if you know the edges in advance disjoint-set is somewhat overkill. you can just do a graph traversal instead – Niklas B. Apr 23 '15 at 19:29
2

I think the count will be off in the provided answer. E.g. if rows is changed to rows = [[1, 0, 0], [1, 1, 1], [1, 0, 0]] still getting 4, though it should be 5. Changing Union to

def Union(x, y):
  xRoot = Find(x)
  yRoot = Find(y)
  if xRoot.rank > yRoot.rank:
    yRoot.parent = xRoot
    xRoot.size += yRoot.size
  elif xRoot.rank < yRoot.rank:
    xRoot.parent = yRoot
    yRoot.size += xRoot.size
  elif xRoot != yRoot:  # Unless x and y are already in same set, merge them
    yRoot.parent = xRoot
    xRoot.rank = xRoot.rank + 1
    xRoot.size += yRoot.size

seems to fix.

Ken
  • 508
  • 1
  • 4
  • 15