10

Given a basic grid (like a piece of graph paper), where each cell has been randomly filled in with one of n colors, is there a tried and true algorithm out there that can tell me what contiguous regions (groups of cells of the same color that are joined at the side) there are? Let's say n is something reasonable, like 5.

I have some ideas, but they all feel horribly inefficient.

8 Answers8

10

The best possible algorithm is O(number of cells), and is not related to the number of colors.

This can be achieved by iterating through the cells, and every time you visit one that has not been marked as visited, do a graph traversal to find all the contiguous cells in that region, and then continue iterating.

Edit:

Here's a simple pseudo code example of a depth first search, which is an easy to implement graph traversal:

function visit(cell) {
    if cell.marked return
    cell.marked = true
    foreach neighbor in cell.neighbors {
        if cell.color == neighbor.color {
            visit(neighbor)
        }
    }
}
recursive
  • 83,943
  • 34
  • 151
  • 241
  • Could you be a little more specific than, "do a graph traversal"? Would that be recursive? –  Jan 17 '09 at 00:07
  • The graph traversal would be a flood fill, as covered in some other answers. – Sparr Jan 17 '09 at 00:12
  • 2
    Uh, a depth-first search is a VERY bad idea; it's all too easy to run out of stack space. Please change that to a breadth-first search instead (or maintain your own stack)/ – ShreevatsaR Jan 18 '09 at 20:20
  • By definition a graph traversal is on top of the iterating through the cells, that's not O(n). – Tatarize Nov 18 '18 at 16:39
  • @Tatarize No, it is O(n) even in a naive implementation as long as graph traversal and cell iteration operate on the same data structure. Graph traversal is O(e) but that is O(n) for planar graphs. – collapsar Dec 03 '20 at 03:14
7

In addition to recursive's recursive answer, you can use a stack if recursion is too slow:

function visit(cell) {
    stack = new stack
    stack.push cell

    while not stack.empty {
        cell = stack.pop
        if cell.marked continue
        cell.marked = true
        foreach neighbor in cell.neighbors {
            if cell.color == neighbor.color {
                stack.push neighbor
            }
        }
    }
}
martinus
  • 17,736
  • 15
  • 72
  • 92
  • I think the idea is to call the `visit(cell)` for each cell in the matrix (graph) and then decide which connected region is the largest for each color. – Rami Alshareef Oct 21 '18 at 23:45
3

You could try doing a flood fill on each square. As the flood spreads, record the grid squares in an array or something, and colour them in an unused colour, say -1.

Rocketmagnet
  • 5,656
  • 8
  • 36
  • 47
3

The Wikipedia article on flood fill might be useful to you here: http://en.wikipedia.org/wiki/Flood_fill

Isvara
  • 3,403
  • 1
  • 28
  • 42
2

Union-find would work here as well. Indeed, you can formulate your question as a problem about a graph: the vertices are the grid cells, and two vertices are adjacent if their grid cells have the same color. You're trying to find the connected components.

The way you would use a union-find data structure is as follows: first create a union-find data structure with as many elements as you have cells. Then iterate through the cells, and union two adjacent cells if they have the same color. In the end, run find on each cell and store the response. Cells with the same find are in the same contiguous colored region.

A. Rex
  • 31,633
  • 21
  • 89
  • 96
0

If you want a little more fine grain control, you might think about using the A* algorithm and use the heuristic to include similarly colored tiles.

DShook
  • 14,833
  • 9
  • 45
  • 55
0

You iterate through the regions in a scanline, going left-right top-bottom. For each cell you make a list of cells shared as the same memory object between the cells. For each cell, you add the current cell to the list (either shared with it or created). Then if the cell to the right or below is the same color, you share that list with that cell. If that cell already has a list, you combine the lists and replace the reference to the list object in each cell listed in the lists with the new merged list.

Then located in each cell is a reference to a list that contains every contiguous cell with that cell. This aptly combines the work of the floodfill between every cell. Rather than repeating it for each cell. Since you have the lists replacing the data with the merged data is just iterating through a list. It will be O(n*c) where n is the number of cells and c is a measure of how contiguous the graph is. A completely disjointed grid will be n time. A completely contiguous 1 color graph with be n^2/2.

Tatarize
  • 10,238
  • 4
  • 58
  • 64
0

I heard this question in a video and also found it here and I came up with what is the best approach I have seen in my searching. Here are the basic steps of the algorithm:

  • Loop through the array (assuming the grid of colors is represented as a 2-dimensional array) from top-left to bottom-right.
  • When you go through the first row just check the color to the left to see if it is the same color. When you go through all subsequent rows, check the cell above and the cell to the left - this is more efficient than checking to the top, bottom, left and right every time. Don't forget to check that the left cell is not out of bounds.
  • Create a Dictionary of type <int,Dictionary<int,Hashset<cell>>> for storing colors and groups within those colors. The Hashset contains cell locations (cell object with 2 properties: int row, int column).
  • If the cell is not connected at the top or left to a cell of the same color then create a new Dictionary entry, a new color group within that entry, and add the current cell to that group (Hashset). Else it is connected to another cell of the same color; add the current cell to the color group containing the cell it's connected to.
  • If at some point you encounter a cell that has the same color at the top and left, if they both belong to the same color group then that's easy, just add the current cell to that color group. Else check the kitty-corner cell to the top-left. If it is a different color than the current cell and the cell to the top and cell to the left belong to different color groups --> merge the 2 color groups together; add the current cell to the group.
  • Finally, loop through all of the Hashsets to see which one has the highest count - this will be the return value.

Here is a link to a video I made with visual and full explanation: https://d.tube/#!/v/israelgeeksout77/wm2ax1vpu3y

P.S. I found this post on GeeksForGeeks https://www.geeksforgeeks.org/largest-connected-component-on-a-grid/

They conveniently posted source code to this problem in several languages! But I tried their code vs. mine and mine ran in about 1/3 of the time.

SendETHToThisAddress
  • 2,756
  • 7
  • 29
  • 54