I have a very large pairs list that I need to break down into single linkage communities. So far I have been able to do this entirely in R just fine. But I need to prepare for the eventuality that the entire list may be too large to hold in memory, or for igraph's R implementation to handle. A very simple version of this task looks like:
library(igraph)
df <- data.frame("p1" = c("a", "a", "d", "d"),
"p2" = c("b", "c", "e", "f"),
"val" = c(0.5, 0.75, 0.25, 0.35))
g <- graph_from_data_frame(d = df,
directed = FALSE)
sg <- groups(components(g))
sg <- sapply(sg,
function(x) induced_subgraph(graph = g,
vids = x),
USE.NAMES = FALSE,
simplify = FALSE)
if df
is incredibly large - on the scale of hundreds of millions, to tens of billions of rows, is there a way for me to extract individual positions of sg
without having to build g
in it's entirety? It's relatively easy for me to store representations of df
outside of R either as a compressed txt file or as a sqlite database.