I have a large text file (>10 million rows, > 1 GB) that I wish to process one line at a time to avoid loading the entire thing into memory. After processing each line I wish to save some variables into a big.matrix
object. Here is a simplified example:
library(bigmemory)
library(pryr)
con <- file('x.csv', open = "r")
x <- big.matrix(nrow = 5, ncol = 1, type = 'integer')
for (i in 1:5){
print(c(address(x), refs(x)))
y <- readLines(con, n = 1, warn = FALSE)
x[i] <- 2L*as.integer(y)
}
close(con)
where x.csv
contains
4
18
2
14
16
Following the advice here http://adv-r.had.co.nz/memory.html I have printed the memory address of my big.matrix
object and it appears to change with each loop iteration:
[1] "0x101e854d8" "2"
[1] "0x101d8f750" "2"
[1] "0x102380d80" "2"
[1] "0x105a8ff20" "2"
[1] "0x105ae0d88" "2"
Can
big.matrix
objects be modified in place?is there a better way to load, process and then save these data? The current method is slow!