2

How does one find the degree centrality of nodes in table like,

article   users
         u1  u2  u3  u4 u5 u6 u7
 1        1   1   1   0  0  0  0
 2        0   1   0   1  1  0  0
 3        1   0   0   1  0  1  1

This is just an example of my data I have a very large file consisting of 1533 articles and about 52000 users.

I want to find the centrality of articles and centrality of users in the matrix.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Naveed Khan Wazir
  • 185
  • 2
  • 4
  • 15

1 Answers1

7

Degree centrality simply counts the number of other nodes that each node is "connected" to. So to do this for users, for example, we have to define what it means to be connected to another user. The simplest approach asserts a connection if a user has at least one article in common with another user. A slightly more complex (and probably better) approach weights connectivity by the number of articles in common. So if user 1 has 10 articles in common with user 2 and 3 articles in common with user 3, we say that user 1 is "more connected" to user 2 than to user 3. In what follows, I'll use the latter approach.

This code creates a sample matrix with 15 articles and 30 users, sparsely connected. It then calculates a 30 X 30 adjacency matrix for users where the [i,j] element is the number of articles user i has in common with user j. Then we create a weighted igraph object from this matrix, and let igraph calculate the degree centrality.

Since degree centrality does not take the weights into account, we also calculate eigenvector centrality (which does take the weights into account). In this very simple example, the differences are subtle but instructive.

# this just set up the sample - you have the matrix M already
n.articles <- 15
n.users    <- 30
set.seed(1)    # for reproducibility
M <- matrix(sample(0L:1L,n.articles*n.users,p=c(0.8,0.2),replace=T),nc=n.users)

# you start here...
m.adj <- matrix(0L,nc=n.users,nr=n.users)
for (i in 1:(n.users-1)) {
  for (j in (i+1):n.users) {
    m.adj[i,j] <- sum(M[,i]*M[,j])
  }
}
library(igraph)
g <- graph.adjacency(m.adj,weighted=T, mode="undirected")
palette <- c("purple","blue","green","yellow","orange","red")
par(mfrow=c(1,2))
# degree centrality
c.d   <- degree(g)
col <- as.integer(5*(c.d-min(c.d))/diff(range(c.d))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Degree Centrality",
     layout=layout.fruchterman.reingold)

# eigenvalue centrality
c.e   <- evcent(g)$vector
col <- as.integer(5*(c.e-min(c.e))/diff(range(c.e))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Eigenvalue Centrality",
     layout=layout.fruchterman.reingold)

So in both cases node 15 has the highest centrality. However, node 28 has a higher degree centrality and a lower eigenvalue centrality than node 27. This is because node 28 is connected to more nodes, but the strength of the connections is lower.

The same approach can of course be used to calculate article centrality; just use the transpose of M.

This approach will not work with 52,000 users - the adjacency matrix will contain > 2.5 billion elements. I'm not aware of a workaround for this - perhaps someone else is, I'd like to hear it. So if you need to tablulate a centrality score for each of the 52,000 users, I can't help you. On the other hand if you want to see patterns, it might be possible to carry out the analysis on a random sample of users (say, 10%).

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • sir by applying the code uptil the loop it gives me an error like, Error in `[<-`(`*tmp*`, i, j, value = 0) : subscript out of bounds, am I making some mistakes ? – Naveed Khan Wazir Jul 22 '14 at 06:36
  • `n.users` is the number of columns in your matrix, and `n.articles` is the number of rows. Did you set those? – jlhoward Jul 22 '14 at 12:45
  • yes I set it like this m.adj<-matrix(0,nc=5216,nr=90) > for(i in 1:5215){ + for(j in (i+1):5216){ + m.adj[i,j]<-sum(df4[,i]*df4[,j]) + } + } – Naveed Khan Wazir Jul 22 '14 at 14:01
  • Not really sure what you're doing here, but the error comes about because you define `m.adj` to have 90 rows (`nr=90`), and then reference row i in the loop where i is in 1:5215. – jlhoward Jul 22 '14 at 14:07
  • and if I use dim(df4) then it becomes like this 89 5216 – Naveed Khan Wazir Jul 22 '14 at 14:13
  • It looks like you want to use the first 5216 columns (users). If so, set `n.users<-5216` and `n.articles<-nrow(df4)` and run the code in the answer, starting at "you start here" – jlhoward Jul 22 '14 at 14:16
  • yup sir its work now thanks for your kindness and time – Naveed Khan Wazir Jul 22 '14 at 14:29
  • I am encountering a similar question, can you please provide more explanation about `palette <- c("purple","blue","green","yellow","orange","red")` and `col <- as.integer(5*(c.d-min(c.d))/diff(range(c.d))+1)` ? – Edward Lin Feb 23 '18 at 10:06