3

So I have a matrix (n row by m column) and want to find the region with the most connected "1s". For example if I have the following matrix:

1 1 0 0
0 1 1 0
0 0 1 0
1 0 0 0

There are 2 regions of "1s" in the matrix.

1st region:

1 1
  1 1
    1

2nd region:

1

I would like to create an algorithm that will output the maximum = 5. I think this has something to do with Depth First Search but I only have base R and access to a few packages.

smci
  • 32,567
  • 20
  • 113
  • 146
Tony Hellmuth
  • 290
  • 2
  • 11

2 Answers2

3

You could use SDMTools. First, we convert the matrix into raster, then we detect clumps (patches) of connected cells. Each clump gets a unique ID. NA and zero are used as background values. Finally, PatchStat provides statistics for each patch.

library(raster)
library(SDMTools)

r <- raster(mat)    
rc <- clump(r)
as.matrix(rc)
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA    1    1    1    1
[2,]    1   NA   NA    1   NA
[3,]    1    1    1   NA    1
[4,]   NA   NA   NA   NA   NA
[5,]    2    2   NA   NA   NA
p <- PatchStat(rc)
max(p$n.cell)  

[1] 10

Sample data

set.seed(2)
m <- 5
n <- 5
mat <- round(matrix(runif(m * n), m, n))
mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    1    1    1    1
[2,]    1    0    0    1    0
[3,]    1    1    1    0    1
[4,]    0    0    0    0    0
[5,]    1    1    0    0    0
DJack
  • 4,850
  • 3
  • 21
  • 45
  • That is a sexy way. Even if clump sounds not so sexy. So much in R! Sadly I would like to do this without requiring those packages, or most really. Any ideas? I mean obviously I attempted it with a list containing each of the connected 1s. But that's about as far as I got. – Tony Hellmuth Apr 14 '18 at 07:55
  • 1
    If you want to compute it from scratch, you could dig into the source code of `clump` function (https://github.com/cran/raster/blob/master/R/clump.R). But it uses `adjacent` and `clusters` that comes from `raster` and `igraph` packages... – DJack Apr 14 '18 at 08:11
  • True, I know igraph is important here. Oh well I will keep at it and try some tricks. I mean I can get all the elements closest to each 1 in a list, very simple. Now I just have to merge that list into separate vectors that contain the unique connected points. Hmm... But thank you for your input! – Tony Hellmuth Apr 14 '18 at 08:14
  • 1
    Turns out igraph is available. `raster` is our downfall :( – Tony Hellmuth Apr 14 '18 at 08:22
  • If you already have the adjacent points then you only need to use this line `igraph::clusters(igraph::graph(adjv, directed=FALSE))$membership[val]` from `clump` source code. `adjv` is a vector with all connected cells (in my example `9 3 12 6 15 9 3 2 4 3 5 4 12 11 13 12 22 21 2 6 5 9 9 13 6 2 9 5 13 9 2 3 3 4 4 5 11 12 12 13 21 22 3 9 6 12 9 15 9 4 11 6 4 9 6 11`) and `val` is a vector with the cells with 1 (in my example `2 3 4 5 6 9 11 12 13 15 21 22`). This will give you a different value for each patch. – DJack Apr 14 '18 at 08:39
  • 1
    Perfect. Works like a charm. Just needs a little smoothing for interpretability! – Tony Hellmuth Apr 14 '18 at 08:46
  • @TonyHellmuth: if you preferred it be done without other packages, should edit that into the question. Is there a reason, or just curiosity? – smci Apr 14 '18 at 22:09
  • Oh it's just require admin rights to download packages. I guess curiosity is the major player now :). I did say only a few packages; maybe I should be slightly clearer. – Tony Hellmuth Apr 15 '18 at 02:40
2

I did this eventually with use of igraph:

library(igraph)
data<-scan("stdin")
n<-data[1]
m<-data[2]

mat<-matrix(data[3:(n*m+2)],nrow=n,ncol=m,byrow=TRUE)
labels <- as.vector(mat)
rows <- (seq(length(labels)) - 1) %% nrow(mat)
cols <- ceiling(seq(length(labels)) / nrow(mat))
g <- graph.lattice(dim(mat), nei=2)

# Remove edges between elements of different types or that aren't diagonal
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]] &
  abs(rows[edgelist[,1]] - rows[edgelist[,2]]) <= 1 &
  abs(cols[edgelist[,1]] - cols[edgelist[,2]]) <= 1
g <- delete.edges(g, E(g)[!retain])

y<-clusters(g)$membership ### clustered matrix as vector
m<-as.vector(mat) ### original matrix 
z<-y[m>0] ### ignore where original matrix is 0
cat(sort(table(z),decreasing=TRUE)[[1]])
Tony Hellmuth
  • 290
  • 2
  • 11