0

I am trying to make a shiny app that can be hosted for free on shinyapps.io. Free hosting requires that all data/code to be uploaded is <1GB, and that when running the app the memory used is <1GB at any time.

The data

The underlying data (that I'm uploading) is 1000 iterations of a network with ~3050 nodes. Each interaction between nodes (~415,000 interactions per network) has 9 characteristics--of the origin, destination, and the interaction itself--that I need to keep track of. The app needs to read in data from all 1000 networks for user-selected node(s) meeting user-input(ted?) criteria (those 9 characteristics) and summarize it (in a map & table). I can use 1000 one-per-network RData files (more on format below) and the app works, but it takes ~10 minutes to load, and I'd like to speed that up.

A couple notes about what I've done/tried, but I'm not tied to any of this if you have better ideas.

  • The data is too large to store as CSVs (and fall under the 1GB upload limit), so I've been saving it as RData files of a data.frame with "xz" compression.
  • To further reduce size, I've turned the data into frequency tables of the 9 variables of interest
  • In a desktop version, I created 10 summary files that each contained the data for 100 networks (~5 minutes to load), but these are too large to be read into memory in a free shiny app.
  • I tried making RData files for each node (instead of by splitting by network), but they're too large for the 1GB upload limit.

I'm not sure there are better ways to package the data (but again, happy to hear ideas!), so I'm looking to optimize processing it.

Finally, a question

Is there a way to read only certain rows from a compressed RData file, based on some value (i.e. nodeID)? This post (quickly load a subset of rows from data.frame saved with `saveRDS()`) makes me think that might not be possible because it's compressed. In looking at other options, awk keeps coming up, but I'm not sure if that would work with an RData file (I only seem to see data.frame/data.table/CSV implementations).

kto
  • 135
  • 1
  • 6
  • I am not aware of a way to read only certain rows from a compressed RData. Two suggestions though. The SQL approach (may be slower): https://stackoverflow.com/questions/18791396/how-to-read-huge-csv-file-into-r-by-row-condition. The disk frame package: https://cran.r-project.org/web/packages/disk.frame/vignettes/ingesting-data.html. I haven't personally used it but from the vignette, it looks like you can convert your data to a `disk.frame`, which supports compression. Then you read the data in chunks and apply the required filter. – jav Oct 17 '19 at 00:20
  • Please provide some files/code as examples. Optimizing the ressources it not an easy problem and without any context we can't help you. – Corentin Limier Oct 17 '19 at 08:28

0 Answers0