Data:
I have a shiny dashboard application and my dataset is around 600 MB in size. It swells by 100 MB every month. My data resides locally in MySQL.
MenuItems:
I have 6 - 7 sidebar menuItems on my dashboard and each of them has 10 - 12 different outputs - charts and tables. Each of these tabs has 3 - 6 inputs such as selectizeInput, slider, date range, etc. to filter the data.
Data subsets:
Since I cannot load all the data into the memory, for every menu item I create a subset of data based on the date range by keeping the date range to just 2 - 3 days from the system date.
For example:
df1 <- reactive({df[df$date >- dateinput[1] & df$date <- dateinput[2], ]})
The above gets the data for my first menu item and depending on the selectInput or other inputs, I am further filtering the data. For example, If I have a selectInput for Gender (male and female)
then I further subset df1
to:
df2 <- reactive({
if(is.null(input$Gender)){
df1
} else if(input$Gender == "Male")
{df1[df1$Gender == "Male",]}
)}
If I have more than 1 input, I subset this df1 further and pass on the values to df2. df2 becomes the reactive dataset for all the charts and tables in that MenuItem.
The more the number of menuItem I create more subsets to suit the filters and analysis.
I face two problems:
- On older machines, the app is not loading. and
- On newer machines, the app loads very slowly sometimes 5 - 6 minutes
After the first set of data load, the charts and tables gets rendered faster on reactive changes.
To counter this, I have tried moving all common and repetitive parameters and libraries to global.R.
I have two questions:
1.are there any basic hygiene factors that one needs to keep in mind when mining data in R especially through shiny (Mining in R is extremely fast).
2.I have read about parallel processing, but almost always all the examples talk about distributing a single heavier calculation. Can we distribute through parallel processing, subsetting the data or distributing charts / tables preparation.
Please note, I am a researcher and not a programmer, but have learnt to use shiny and host applications on the cloud or locally recently.
Guidance on this will be very helpful for many novice users of R like me.