2

I want to compute closeness centrality measure on a network with disconnected components. closeness function in igraph does not give meaningful results on such graphs. (see)

Then I came accross this site where it is explained that closeness can be measured on graphs with disconnected components as well.

The following code is what is suggested to achieve this:

# Load tnet
library(tnet)
 
# Load network 
# Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists
net <- cbind(
  i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11),
  j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10),
  w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
 
# Calculate measures
closeness_w(net, gconly=FALSE)

In my case, I have a transaction data, so the network I build on this data is directed and weighted. Weights consist of 1/(transaction amount).

This is my example data:

structure(list(id = c(2557L, 1602L, 18669L, 35900L, 48667L, 51341L
), from = c("5370", "6390", "5370", "5370", "8934", "5370"), 
    to = c("5636", "5370", "8933", "8483", "5370", "7626"), date = structure(c(13099, 
    13113, 13117, 13179, 13238, 13249), class = "Date"), amount = c(2921, 
    8000, 169.2, 71.5, 14.6, 4214)), row.names = c(NA, -6L), class = "data.frame")

I use the following code to achieve what I want:

df2 <- select(df,c(from,to,amount)) %>% 
    group_by(from,to) %>% mutate(weights=1/sum(amount)) %>% select(-amount) %>% distinct
  
  network <- cbind(df2$from,df2$to,df2$weights)

  cl <- closeness_w(network, directed = T, gconly=FALSE)  # here it gives the error: "Error in net[, "w"]^alpha : non-numeric argument to binary operator"

  # so I modify from and to columns as follows to solve the error mentioned above
  df2$from <- as.integer(df2$from)
  df2$to <- as.integer(df2$to)
  # then I run the code again
  network <- cbind(df2$from,df2$to,df2$weights)
  cl <- closeness_w(network, directed = T, gconly=FALSE)

However the output is not like the one on the website that is only consisting closeness scores for each node, instead it created so many rows with 0 value, I dont know why.

The output I got is as follows:

     node  closeness    n.closeness
   [1,]    1 0.00000000 0.000000000000
   [2,]    2 0.00000000 0.000000000000
   [3,]    3 0.00000000 0.000000000000
   [4,]    4 0.00000000 0.000000000000
   [5,]    5 0.00000000 0.000000000000
   ...........................................................
 [330,]  330 0.00000000 0.000000000000
 [331,]  331 0.00000000 0.000000000000
 [332,]  332 0.00000000 0.000000000000
 [333,]  333 0.00000000 0.000000000000
 [ reached getOption("max.print") -- omitted 8600 rows ]

Also, inputs in i and j columns in the data given on the website are reciprocal that is 1->2 exists iff 2->1 exists. But my data is not like that, so in my data 5370 sent money to 5636, but 5636 haven't sent any money to 5370. So, how can I compute closeness measure correctly on such directed network of transaction data. Is there anyone that tried a similar computation before?

EDIT: Since the weights are not considered as distance in closeness_w function, but rather they are considered as strength, I should have determined weights as sum(amount) instead of 1/sum(amount)

Leyla Alkan
  • 355
  • 3
  • 12

2 Answers2

1

The reason you get many rows with zero values is because it provides a closeness value for nodes 1 to 8934 (max value in your matrix). If you filter for the values in your dataframe you'll find the values you're looking for:

cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl[cl[, "node"] %in% c(df2$from), ]

     node  closeness  n.closeness
[1,] 5370 1.37893704 1.543644e-04
[2,] 6390 0.03668555 4.106745e-06
[3,] 8934 5.80008056 6.492870e-04

The direction has been accounted for, if you filter for the 'to' nodes you'll see only 5370 has a value:

cl[cl[, "node"] %in% c(df2$to), ]

     node closeness  n.closeness
[1,] 5370  1.378937 0.0001543644
[2,] 5636  0.000000 0.0000000000
[3,] 7626  0.000000 0.0000000000
[4,] 8483  0.000000 0.0000000000
[5,] 8933  0.000000 0.0000000000

If you go back to the example you're following, if you remove nodes from the middle of the data you'll see that it gives zeros for missing nodes, and try setting directed = F and you'll notice the difference.

Update:

If you want an alternative to creating your network, after you create df2 you can just pass that into the closeness_w function and your node labels will become indices and the node column gets reduced to 1:n:

df2 <- df %>% 
  group_by(from, to) %>% 
  mutate(weights = 1/sum(amount)) %>% 
  select(from, to, weights) %>% 
  distinct

cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl 

     node  closeness n.closeness
5370    1 1.37893704 0.229822840
5636    2 0.00000000 0.000000000
7626    3 0.00000000 0.000000000
8483    4 0.00000000 0.000000000
8933    5 0.00000000 0.000000000
6390    6 0.03668555 0.006114259
8934    7 5.80008056 0.966680093
LRRR
  • 456
  • 3
  • 8
  • Thank you!. So, it's not a good idea to convert `from` and `to` columns into integers, right? However, I don't know how else I would build the network. Is there a better way to handle this? – Leyla Alkan Sep 23 '20 at 12:41
1

The webpage you quote does not explain that "closeness can be applied to disconnected networks". Instead, it proposes computing an entirely different quantity than closeness.

What they compute is in fact known as global efficiency, and was proposed in this paper:

You will find implementations in some packages. I have implemented this for igraph as well, and it will be included in version 0.9 of C/igraph (presumably also in some version of R/igraph). It is already accessible from IGraph/M, which serves as igraph's Mathematica interface.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174