Possibility of speeding up transitive closure list updater

Question

I have created the following function:

update_list3 = function(identity_dict){
    suppressMessages(suppressWarnings({
        library(ggm)
        #adjacency matrix
        g = matrix(unlist(lapply(identity_dict, FUN = function(x){
            b = rep(0, length(identity_dict))
            b[x] = 1
            b})),
            nrow = length(identity_dict), byrow = T)
        #map transitive closure
        closure = transClos(g)
        #turn back to adjacency list
        out_list = lapply(as.list(data.frame(t(closure))), FUN = function(x){which(x == 1)})
        #tranclos removes self-connection, so bind self back to own identity
        out_list = mapply(c, as.list(c(1:length(out_list))), out_list, SIMPLIFY = F)
        lapply(out_list, FUN = sort)
    }))
}

To update a list, identity_dict, of the form:

1: {1,3,4}
2: {2,5}
3: {3,4}
4: {4}
5: {5}

to provide the transitive closure of the undirected graph, with the form:

1: {1,3,4}
2: {2,5}
3: {1,3,4}
4: {1,3,4}
5: {2,5}

This works for small problems, but larger problems begin to stall quite quickly. Would there be a possibility of performing the same calculation by different means - perhaps the matrix transformation is limiting due to size?

An ideal solution would not use iGraph, as we have had problems with this package returning errors for large problems.

Inputs work as intended, but speed rapidly decreases with large problems.

Does not scale to larger problems in good time.

I would like to respond to _"An ideal solution would not use iGraph, as we have had problems with this package returning errors for large problems."_ Keep in mind that igraph is the work of volunteers _like you_ who offered their free time to create and maintain this open-source package. _You_ can contribute too, for example by reporting any problems you encounter, so that they can be fixed. — Szabolcs, Jul 28 '23 at 11:57
You say you want _"to provide the transitive closure of the undirected graph"_. Transitive closure is not a useful concept for _undirected_ graphs. It is simply a disjoint union of complete graphs on each connected component. Consider if this is really what you want to compute. — Szabolcs, Jul 28 '23 at 11:59
@Szabolcs regarding your first comment, I completely understand and, when I am able to trace the exact error my iGraph implementation returns, one of my first aims is to get this communicated as a potential bug. regarding your second comment, the goal of this tool is to attempt to link several unlinked trees - one of the steps is therefore "establishing a link" and cascading any relationships throughout the whole, newly linked component. I realise it's a potentially unusual application, but it does work well for our means. thanks for your advice :) — ABuist, Jul 31 '23 at 08:16
I don't quite understand your description, but I assume you just want a quick way to check if two vertices are within the same connected component? This is precisely what "transitive closure" is for an undirected graph. Then simply compute connected components. `c <- components(g)`. If `c$membership[v] == c$membership[u]` then `v` and `u` are in the same component. _Do not construct the transitive closure for this!_ It will have a quadratic number of edges and it will be impossible to store for large graphs. — Szabolcs, Jul 31 '23 at 08:31

ThomasIsCoding · Answer 1 · 2023-07-28T21:24:38.913

If you would like to avoid using igraph, you can use the following code

dict <- identity_dict
out <- list()
repeat {
    if (length(dict) == 0) {
        break
    }
    d <- as.character(dict[[1]])
    repeat {
        d2 <- dict[as.character(unique(unlist(d)))]
        v <- as.character(unique(unlist(d2)))
        if (length(setdiff(v, d)) == 0) {
            dict <- dict[!names(dict) %in% v]
            out[v] <- rep(list(as.integer(v)), length(v))
            break
        } else {
            d <- v
        }
    }
}
identity_dict <- out[names(identity_dict)]

which finally gives

> identity_dict
$`1`
[1] 1 3 4

$`2`
[1] 2 5

$`3`
[1] 1 3 4

$`4`
[1] 1 3 4

$`5`
[1] 2 5

If igraph works for you (I have no clue what kind of errors you have encountered with igraph), I think components and membership is the thing you may need, e.g.,

# group info
grp <- stack(identity_dict) %>%
    graph_from_data_frame() %>%
    components() %>%
    membership()

# update `identity_dict`
identity_dict[] <- ave(
    as.integer(names(grp)),
    grp,
    FUN = list
)[match(names(identity_dict), names(grp))]

such that

> identity_dict
$`1`
[1] 1 3 4

$`2`
[1] 2 5

$`3`
[1] 1 3 4

$`4`
[1] 1 3 4

$`5`
[1] 2 5

Data

identity_dict <- list(
    `1` = c(1, 3, 4),
    `2` = c(2, 5),
    `3` = c(3, 4),
    `4` = 4,
    `5` = 5
)

and

> identity_dict
$`1`
[1] 1 3 4

$`2`
[1] 2 5

$`3`
[1] 1 3 4

$`4`
[1] 1 3 4

$`5`
[1] 2 5

Possibility of speeding up transitive closure list updater

1 Answers1

Data