1

I'm taking a large (dense) network matrix and converting it to an edgelist. Yet, when I do so, the memory allocated in R seems crazy. In my case, I have a 12MB matrix (1259 x 1259) that when converted to an edgelist (i, j, w) is taking up 71MB of memory! I'm using the igraph package to perform the operations, but I don't think it is related to that. Here is what I'm doing with made up data.

library(igraph)
A <- matrix(runif(25), 5, 5)
A <- A %*% t(A)
diag(A) <- 0

I made the matrix symmetric and diagonal 0 because that is what my data looks like, but I don't think it matters for this question. Here I use igraph:

# using igraph here
adj <- graph.adjacency(as.matrix(A),weighted=TRUE)
object.size(A) # 400 bytes
object.size(adj) # 2336 bytes

I get that the igraph adj object will be bigger. That isn't the issue.

el <- get.edgelist(adj)
class(el) # "matrix"
object.size(el) # 520 bytes

w <- E(adj)$weight
class(w) # "numeric"
object.size(w) # 200 bytes

# expect something ~720 bytes
adj_w <- cbind(el,w)
class(adj_w) # "matrix"
object.size(adj_w) # 1016 bytes

Why is the memory on adj_w so much larger? It doesn't even seem to be linear since here, the original to final is 400 bytes to 1016 bytes but in my (bigger) data it is 12MB to 71MB.

FYI: I'm using RStudio locally on a Macbook Pro with the latest versions (just installed it all last week).

Jesse Blocher
  • 523
  • 1
  • 4
  • 16

1 Answers1

3

adj_w is larger because cbind added a column name. Remove it and you're back to the correct size.

head(adj_w)
#                 w
# [1,] 1 2 1.189969
# [2,] 1 3 1.100843
# [3,] 1 4 0.805436
# [4,] 1 5 1.001632
# [5,] 2 1 1.189969
# [6,] 2 3 1.265916

object.size(adj_w)
# 1016 bytes

attributes(adj_w)
# $dim
# [1] 20  3
# 
# $dimnames
# $dimnames[[1]]
# NULL
# 
# $dimnames[[2]]
# [1] ""  ""  "w"
# 
# 

adj_w2 <- adj_w
dimnames(adj_w2) <- NULL
object.size(adj_w2)
# 680 bytes

To avoid the automatic column name addition, you can first convert your vector to a matrix...

adj_w3 <- cbind(el, matrix(w))
object.size(adj_w3)
# 680 bytes

...or, alternatively, pass the deparse.level = 0 argument to cbind.

adj_w4 <- cbind(el, w, deparse.level = 0)
object.size(adj_w4)
# 680 bytes
Alexey Shiklomanov
  • 1,592
  • 13
  • 23
  • You answered the technical part. I was also making a bad logical assumption. In my mind, A and adj_w are representing the same network and so should be the same size, yet A is 400 b and adj_w (even with your adjustment) is 680 b. Why so much bigger? The mistake I made was that now the indexing has to be stored. In A, each 1:N dimension is implied, not stored, but now the first two columns of adj_w store those indexes, which almost triples the storage size. – Jesse Blocher Apr 19 '17 at 03:04