0

I'm in the process of creating a weighted igraph network object from a edge list containing two columns from and to. It has proven to be somewhat challenging for me, because when doing a workaround, I notice changes in the network metrics and I believe I'm doing something wrong.

library(igraph)
links <- read.csv2("edgelist.csv")
vertices <- read.csv2("vertices.csv")
network <- graph_from_data_frame(d=links,vertices = vertices,directed = TRUE)

##the following step is included to remove self-loops that I have used to include all isolate nodes to the network##

network <- simplify(network,remove.multiple = FALSE, remove.loops = TRUE)

In this situation I have successfully created a network object. However, it is not weighted. Therefore I create a second network object by taking the adjacency matrix from the objected created earlier and creating the new igraph object from it like this:

gettheweights <- get.adjacency(network)
network2 <- graph_from_adjacency_matrix(gettheweights,mode = "directed",weighted = TRUE)

However, after this when I call both of the objects, I notice a difference in the number of edges, why is this?

network2
IGRAPH ef31b3a DNW- 200 1092 --  

network
IGRAPH 934d444 DN-- 200 3626 -- 

Additionally, I believe I've done something wrong because if they indeed would be the same network, shouldn't their densities be the same? Now it is not the case:

graph.density(network2)
[1] 0.02743719

graph.density(network)
[1] 0.09110553

I browsed and tried several different answers found from here but many were not 1:1 identical and I failed to find a solution.

voppikode
  • 27
  • 6
  • Fishy. Are you sure that you didn't build the `network2` from the simplified network while `network` was un-simplified? If you run: `g <- erdos.renyi.game(100,250,'gnm', directed=T)` `graph.density(g) == graph.density(graph_from_adjacency_matrix(get.adjacency(g)))` the density should really be the same. Having some sample data to replicate the output from your code would be helpfull. – nJGL Sep 21 '21 at 11:09
  • @nJGL Indeed, running that code returns TRUE, so it has to be me doing something wrong. The original edge list I'm working with is such that it contains several same edges multiple times (A-B; A-B; A-B; etc.). There are 3626 edges in total, but I believe the changed number of edges (from 3626 to 1092) is actually referring to the unique edges. – voppikode Sep 21 '21 at 14:03
  • We shall get to the bottom of this. Neither self-loops or dubble loops are a problem when transfering via an adjacency matrix. If you look at this: `g <- make_empty_graph(directed=T) %>% add_vertices(5) %>% add_edges(c(1,2,1,3,1,4,2,5,4,5,1,2,1,1));gg <- graph_from_adjacency_matrix(get.adjacency(g));get.adjacency(g);plot(g);plot(gg)` you'll see that the `get.adjecency(g)` will show the weight `1` on 1->1 for a loop and the weight `2` for 1->2 for the dubble link from 1 to 2 in a directed (default) graph. – nJGL Sep 22 '21 at 07:32

1 Answers1

0

All seems to be in order. When you re-project a network with edge-duplicates to be represented as a weight by the number of edges between given vertices, the density of your graph should change.

When you you test graph.density(network2) and graph.density(network), they should be different if indeed edge-duplicates were reduced to single-edges with weight as an edge attribute, as your output from network2 and network suggest.

This (over-) commented code goes through the process.

library(igraph)

# Data that should resemble yours
edges <- data.frame(from=c("A","B","C","D","E","A","A","A","B","C"),
                    to  =c("A","C","D","A","B","B","B","C","B","D"))
vertices <- unique(unlist(edges))
# Building graphh in the same way as you do
g0 <- graph_from_data_frame(d=edges, vertices=vertices, directed = TRUE)

# Note that the graph is "DN--": directed, named, but NOT Weighted, since
# Instead of weighted edges, we have a whole lot of dubble edges
(g0)
plot(g0)

# We can se the dubble edges in the adjacency matrix as >1
get.adjacency(g0)

# Use simplify to remove LOOPS ONLY as we can see in the adjacency metrix test
g1 <- simplify(g0, remove.multiple = FALSE, remove.loops = TRUE)
get.adjacency(g1) == get.adjacency(g0)

# Turn the multiple edges into edge-weights by jumping through an adjacency matrix
g2 <- graph_from_adjacency_matrix(get.adjacency(g1), mode = "directed", weighted = TRUE)

# Instead of multiple edges (like many links between "A" and "B"), there are now
# just single edges with weights (hence the density of the network's changed).
graph.density(g1) == graph.density(g2)

# The former doubble edges are now here:
E(g2)$weight

# And we can see that the g2 is now "Named-Directed-Weighted" where g1 was only
# "Named-Directed" and no weights.
(g1);(g2)

# Let's plot the weights
E(g2)$width = E(g2)$weight*5
plot(g2)

A shortcoming of this/your method, however, is that the adjacency matrix is able to carry only the edge-count between any given vertices. If your edge-list contains more variables than i and j, the use of graph_from_data_frame() would normally embed edge-attributes of those variables for you straight from your csv-import (which is nice).

When you convert the edges into weights, however, you would loose that information. And, come to think of it, that information would have to be "converted" too. What would we do with two edges between the same vertices that have different edge-attributes?

At this point, the answer goes slightly beyond your question, but still stays in the realm of explaining the relation between graphs of multiple edges between the same vertices and their representation as weighted graphs with only one structural edge per verticy.

To convert edge-attributes along this transformation into a weighted graph, I suggest you'd use dplyr to "rebuild" any edge-attributes manually in order to keep control of how they are supposed to be merged down when recasting into a weighted one.

This picks up where the code above left off:

# Let's imagine that our original network had these two edge-attributes
E(g0)$coolness <- c(1,2,1,2,3,2,3,3,2,2)
E(g0)$hotness <- c(9,8,2,3,4,5,6,7,8,9)
# Plot the hotness
E(g0)$color <- colorRampPalette(c("green", "red"))(10)[E(g0)$hotness]
plot(g0)
# Note that the hotness between C and D are very different

# When we make your transformations for a weighted netowk, we loose the coolness
# and hotness information
g2 <- g0 %>% simplify(remove.multiple = FALSE, remove.loops = TRUE) %>%
  get.adjacency() %>%
  graph_from_adjacency_matrix(mode = "directed", weighted = TRUE)
g2$hotness # Naturally, the edge-attributes were lost!

# We can use dplyr to take controll over how we'd like the edge-attributes transfered
# when multiple edges in g0 with different edge attributes are supposed to merge into
# one single edge
library(dplyr)
recalculated_edge_attributes <- 
data.frame(name = ends(g0, E(g0)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->"),
           hotness = E(g0)$hotness) %>%
  group_by(name) %>%
  summarise(mean_hotness = mean(hotness))

# We used a string-version of the names of connected verticies (like "A->B") to refere
# to the attributes of each edge. This can now be used to merge back the re-calculated
# edge-attributes onto the weighted graph in g2
g2_attributes <- data.frame(name = ends(g2, E(g2)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->")) %>%
  left_join(recalculated_edge_attributes, by="name")
# And manually re-attatch our mean-attributes onto the g2 network
E(g2)$mean_hotness <- g2_attributes$mean_hotness
E(g2)$color <-  colorRampPalette(c("green", "red"))(max(E(g2)$mean_hotness))[E(g2)$mean_hotness]

# Note how the link between A and B has turned into the brown mean of the two previous
# green and red hotness-edges
plot(g2)

Sometimes, your analyses may benefit from either structure (weighted no duplicates or unweighted with duplicates). Algorithms for, for example, shortest paths are able to incorporate edge-weight as described in this answer, but other analyses might not allow for or be intuitive when using the weighted version of your network data.

Let purpose guide your structure.

nJGL
  • 819
  • 5
  • 17