1

There is a question that has a very similar title (cbind runs very slow) but it does not help me with my problem. I am retrieving 10+ JSON files with 100 variables each and I try to create one big data.frame/table with 1000 columns. In practice, I do not use the very same JSON-file as per the example but different ones. Ideally only the problematic line cx <- cbind(cx, bx) would speed up as the other lines (unlist, as.data.table) work well for me and I would not know what else to use. I know, "cbind is slow" but do I have any alternatives? Ideally with Base R.

library(jsonlite)
library(data.table)

starttime <- Sys.time()
for (i in 1:10) {          # loop through all  10 json files
  zz <- Sys.time()         # measuring the time for each loop
  urlx <- "http://mysafeinfo.com/api/data?list=englishmonarchs&format=json"
  jsnx <- fromJSON(urlx)
  if(i==1) {
    ax <- unlist(jsnx)
    bx <- as.data.table(ax)
    cx <- bx
  }
  for (j in 1:100) {        # loop through all 100 variables in each file
    ax <- unlist(jsnx)
    bx <- as.data.table(ax)
    cx <- cbind(cx, bx) # <---- VERY SLOW ----
  }
  zz <- round(Sys.time()-zz,1)
  print(sprintf("%1.1f", zz))
  flush.console()
}
endtime  <- Sys.time()
endtime-starttime

This gets slower and slower with more files, here my timings.

[1] "0.7"
[1] "1.3"
[1] "1.3"
[1] "1.6"
[1] "2.1"
[1] "2.2"
[1] "2.5"
[1] "3.2"
[1] "3.4"
[1] "3.5"
Gecko
  • 354
  • 1
  • 10
  • 6
    the problem is progressively expanding your data.frame. If you instead create a list with the intermediary results you can use `do.call(cbind, result_list)` to perform only a single `cbind`. Eg. `result_list <- vector("list", n); result_list[[i]] <- [do stuff here]; do.call(cbind, result_list)`. – Oliver May 27 '20 at 15:45
  • 2
    Alternatively: As you are using `data.table`, you can progressively add the columns using `cx[, c(column_names) := ax]`, which should be somewhat linear in time-complexity. – Oliver May 27 '20 at 15:49
  • 1
    Are you sure your code is doing what you want it to? In particular the part where you're looping through all 100 variables? Looks to me you're just running the same code 100 times and the output looks non-sensical. – asachet May 27 '20 at 15:51
  • There are several packages with a `fromJSON` function. Please start your scripts with calls to `library(pkgname)` whenever you use non-base functions. – Rui Barradas May 27 '20 at 15:53
  • 3
    Related: [Dynamically _filling_ an object using a for loop is fine - what causes problems is when you dynamically _build_ an object using a for loop (e.g. using `cbind`)](https://stackoverflow.com/questions/49500364/loop-to-dynamically-fill-dataframe-r). – Henrik May 27 '20 at 15:54
  • Added the library calls, thanks Rui Barradas. The code is nonsensical in this example as it is using the very same json-file and the same variable. In my actual code I do use different json-files and each time a different variable. – Gecko May 27 '20 at 16:18

0 Answers0