4

I'm making a shiny app which reads from a file, does some processing, and produces a table in the UI. The problem is that the file may be very big, and the analysis is slow, so processing the table may take a long time (often several minutes, possibly half an hour). I would like to display a partial table, and add to it every time a new row has been computed so that the user can see the data as it is generated.

I'm using a reactive value to store the data to make the table, and then rendering the table using renderTable()

below is an illustration of the problem (it's not my actual code for cleanliness reasons, but it works as an illustration)

library(shiny)

ui <- fluidPage(
  titlePanel("title"),
  sidebarLayout(
    sidebarPanel(
      actionButton(inputId = "button", label = "make table")
    ),
    mainPanel(
      uiOutput("table")
    )
  )
)

makeTable <- function(rv){
  data = c(1:10)
  withProgress({
    for(i in 1:5){
      d = runif(10)
      data = rbind(data, d)
      Sys.sleep(1)
      rv$table = data
      incProgress(1/5)
    }
  })
  rv$table = data
}

server <- function(input, output){
  rv = reactiveValues(table = c())

  observeEvent(input$button, {
    makeTable(rv)
  })

  output$table = renderTable(
    rv$table
  )
}

shinyApp(ui, server)

I put sys.sleep(1) so that the table is built over 5 seconds. Currently, despite rv$data = data appearing inside the for loop, the table is not shown until the whole thing is finished. Is there a way to modify the code above so that the rows of the table (generated by each iteration of the for loop) are added each second, rather then all at the end?

Edit: I should have made it clear that the file is read in quickly (before the make table button is pressed), the long part is the processing inside the for loop (which depends on the size of the file). I'm not having trouble reading from or writing to files - I'm wondering if there's a way to assign rv$table = data inside the for loop, and have that change reflected in the UI while the loop is still running (and in general, how to make any arbitrary UI and reactive value in a loop behave that way)

thejbug
  • 41
  • 4
  • Writing that file is just a way to make the current progress of your processing available to the shiny app without having your computations freezing the shiny app. – ismirsehregal May 23 '19 at 21:22
  • To make it clear: if you have a for loop running in your shiny app (single thread) nothing else will be executed until that loop is exited. That is why we suggest to run that loop in a different thread (asynchronously) than your shiny app. The file we are talking about is just a way to communicate among those threads. – ismirsehregal May 23 '19 at 21:36
  • 1
    is there no general solution? It seems like it would be a fairly common thing to want to do - I just need the UI to react to changes as they occur in a for loop, even if it's just printing the index of the loop in a textOutput, for instance. – thejbug May 23 '19 at 21:36
  • I think I see what your saying - just one more thing: in my example I have a progress bar around the for loop, but when I place it in a future({}), i get an error: Warning: Error in withProgress: 'session' is not a ShinySession object. do you know any way to fix this? since it seems like I'll need to use futures... – thejbug May 23 '19 at 21:42
  • As I mentioned in my answer `library(ipc)` provides async progress bars. Please check the examples. – ismirsehregal May 23 '19 at 21:47

2 Answers2

3

I would detach the processing part from your shiny app, to keep it responsive (R is single threaded).

Here is an example which continuously writes to a file in a background R process created via library(callr). You can then read in the current state of the file via reactiveFileReader.

Edit: if you want to start the file processing session-wise just place the r_bg() call inside the server function (see my comment). Furthermore, the processing currently is done row-wise. In your actual code you should consider processing the data batch-wise instead (n rows, what ever is reasonable for your code)

library(shiny)
library(callr)

processFile <- function(){

  filename <- "output.txt"

  if(!file.exists(filename)){
    file.create(filename)
  }

  for(i in 1:24){
    d = runif(1)
    Sys.sleep(.5)
    write.table(d, file = filename, append = TRUE, row.names = FALSE, col.names = FALSE)
  }

  return(NULL)
}


# start background R session ----------------------------------------------
rx <- r_bg(processFile)


# create shiny app --------------------------------------------------------

ui <- fluidPage(
  titlePanel("reactiveFileReader"),
  sidebarLayout(
    sidebarPanel(
    ),
    mainPanel(
      uiOutput("table")
    )
  )
)

server <- function(input, output, session){

  # rx <- r_bg(processFile) # if you want to start the file processing session-wise

  readOutput <- function(file){
    if(file.exists(file)){
      tableData <- tryCatch({read.table(file)}, error=function(e){e}) 
      if (inherits(tableData, 'error')){
        tableData = NULL
      } else {
        tableData
      }
    } else {
      tableData = NULL
    }
  }

  rv <- reactiveFileReader(intervalMillis = 100, session, filePath = "output.txt", readFunc = readOutput)

  output$table = renderTable({
    rv()
  })

  session$onSessionEnded(function() {
    file.remove("output.txt")
  })

}

shinyApp(ui, server)

As an alternative approach I'd recommend library(ipc) which lets you set up continuous communication between R processes. Also check my answer here on async progressbars.

Result using library(callr):

callr


Result using library(promises): (code by @antoine-sac) - blocked shiny session

enter image description here



Edit: Here is another approach utilizing library(ipc) This avoids using reactiveFileReader and therefore no file handling is required in the code:

library(shiny)
library(ipc)
library(future)
library(data.table)
plan(multiprocess)

ui <- fluidPage(

  titlePanel("Inter-Process Communication"),

  sidebarLayout(
    sidebarPanel(
      textOutput("random_out"),
      p(),
      actionButton('run', 'Start processing')
    ),

    mainPanel(
      tableOutput("result")
    )
  )
)

server <- function(input, output) {

  queue <- shinyQueue()
  queue$consumer$start(100)

  result_row <- reactiveVal()

  observeEvent(input$run,{
    future({
      for(i in 1:10){
        Sys.sleep(1)
        result <- data.table(t(runif(10, 1, 10)))
        queue$producer$fireAssignReactive("result_row", result)
      }
    })

    NULL
  })

  resultDT <- reactiveVal(value = data.table(NULL))

  observeEvent(result_row(), {
    resultDT(rbindlist(list(resultDT(), result_row())))
  })

  random <- reactive({
    invalidateLater(200)
    runif(1)
  })

  output$random_out <- renderText({
    paste("Something running in parallel", random())
  })

  output$result <- renderTable({
    req(resultDT())
  })
}

shinyApp(ui = ui, server = server)

To clean up the discussion I've had with @antoine-sac for future readers: On my machine using his code I was indeed experiencing a direct interconnection between the long running code (sleep time) and the blocked UI:

blocking example

However, the reason for this was not that forking is more expensive depending on the OS or using docker as @antoine-sac stated - the problem was a lack of available workers. As stated in ?multiprocess:

workers: A positive numeric scalar or a function specifying the maximum number of parallel futures that can be active at the same time before blocking.

The default is determined via availableCores() - although on a windows machine plan(multiprocess) defaults to multisession evaluation.

Accordingly the discussion was triggered by a lack of configuration and different defaults used due to the underlying hardware.

Here is the code to reproduce the gif (based on @antoine-sac's first contribution):

library(shiny)
library(future)
library(promises)
plan(multiprocess)
# plan(multiprocess(workers = 10))

ui <- fluidPage(
  titlePanel("title"),
  sidebarLayout(
    sidebarPanel(
      p(textOutput("random")),
      p(numericInput("sleep", "Sleep time", value = 5)),
      p((actionButton(inputId = "button", label = "make table"))),
      htmlOutput("info")
    ),
    mainPanel(
      uiOutput("table")
    )
  )
)

makeTable <- function(nrow, input){
  filename <- tempfile()
  file.create(filename)
  for (i in 1:nrow) {
    future({
      # expensive operation here
      Sys.sleep(isolate(input$sleep))
      matrix(c(i, runif(10)), nrow = 1)
    }) %...>%
      as.data.frame() %...>%
      readr::write_csv(path = filename, append = TRUE)
  }

  reactiveFileReader(intervalMillis = 100, session = NULL,
                     filePath = filename,
                     readFunc = readr::read_csv, col_names = FALSE)
}

server <- function(input, output, session){
  timingInfo <- reactiveVal()
  output$info <- renderUI({ timingInfo() })

  output$random <- renderText({
    invalidateLater(100)
    paste("Something running in parallel: ", runif(1))
  })

  table_reader <- eventReactive(input$button, {
    start <- Sys.time()
    result <- makeTable(10, input)
    end <- Sys.time()
    duration <- end-start
    duration_sleep_diff <- duration-input$sleep
    timingInfo(p("start:", start, br(), "end:", end, br(), "duration:", duration, br(), "duration - sleep", duration_sleep_diff))
    return(result)
  })
  output$table = renderTable(table_reader()()) # nested reactives, double ()
}

shinyApp(ui, server)
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78
  • A good but outdated answer: in shiny, it is all abstracted away with `promises` now. https://rstudio.github.io/promises/articles/shiny.html – asachet May 23 '19 at 09:02
  • Can you then please show me how to continuously consume data via `promises`? As far as I know with a `promise` you'll have to wait for the result (async of course). – ismirsehregal May 23 '19 at 09:06
  • You could do it in the same way, writing to a file and using `reactiveFileReader`. You'd simply do `for (i in 1:10) { future({...}) %...>% write.table(...) }`. A small change for sure but I prefer to use "shinyverse" packages. As a side effect, it makes the code parallel. – asachet May 23 '19 at 09:51
  • Well, unfortunately it’s not that easy. Using promises you’ll still block-in your current shiny session with long running calculations. Please read [this](https://github.com/rstudio/promises/issues/23#issuecomment-386687705). Joe Cheng also mentions a workaround and it’s downsides (race conditions). Later on, in the Thread he suggests using [callr](https://github.com/rstudio/promises/issues/23#issuecomment-386764021). Just as I did here. – ismirsehregal May 23 '19 at 10:59
  • Thanks for the link! But it discusses the limitation of returning a future in a reactive, which is not what I am doing. We are already working around it by writing to a file. That said, interesting that you observe a delay: I don't see one on my Unix machine. But it is not due to the `promises` library. It was because in your code, you loop in your fork (and process sequentially) while I fork for every rows (and process in parallel). This may block in environments where forks are slow such as Windows or Docker containers. I have now put the loop in the `future` call and it will work just fine. – asachet May 23 '19 at 13:53
  • Yes, now that is something quite different than your `%...>%` "I'll create a future for every row"-approach. Nevertheless, In my eyes kicking off a background process or setting up a future from within shiny, writing the results to a table and reading them back in via `reactiveFileReader` is not a big difference. There even is library([future.callr](https://cran.r-project.org/web/packages/future.callr/index.html)) available, which uses callr as backend for futures. Accordingly, I don't see the added value of your answer. – ismirsehregal May 23 '19 at 14:05
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/193839/discussion-between-antoine-sac-and-ismirsehregal). – asachet May 23 '19 at 14:07
  • I like the `ipc` approach. For larger objects writing to a file could be quite slow. – Jan Stanstrup Jan 28 '20 at 09:56
  • 1
    `resultDT` is an `reactiveVal`. For reactiveVals enclosing an object in it's parenthesis means to assign a new value to it just like `<-` in standard R. Please see [this](https://shiny.rstudio.com/reference/shiny/1.3.0/reactiveVal.html) for an example. – ismirsehregal Jan 28 '20 at 10:00
  • 1
    @JanStanstrup to work around the file based approach you could e.g. populate a in-memory sqlite database instead of a csv file and use `reactivePoll` over `reactiveFileReader`. – ismirsehregal Jan 28 '20 at 10:05
  • Thanks. You answered the `resultDT` faster than I removed the question after I realized the answer... The database would work too. But I guess it is also file based in the end. Also you cannot write complex objects to sqlite. In the end in my use case I would need to populate a list with S4 objects. The `ipc` approach should be able to do that I assume. It would be nice though if there was a more direct way of doing this with `future`. – Jan Stanstrup Jan 28 '20 at 10:47
  • Ok, I see - that makes things a little more difficult. In the end both processes need to shake hands somehow. Maybe you can write a "heartbeat" message once a certain part of your long running process finishes - and therefore populate your list in batches. – ismirsehregal Jan 28 '20 at 11:18
  • But wouldn't your `ipc` approach work? Instead if `rbind` just add to a list? – Jan Stanstrup Jan 28 '20 at 12:03
  • Sure, that was just a suggestion to reduce harddisk traffic. – ismirsehregal Jan 28 '20 at 12:30
1

You need asynchronous capabilities. This is built in shiny since v1.1+.

The promises package (which already comes with shiny) offers a simple API to run asynchronous operations in shiny and is designed to play well with reactives.

https://rstudio.github.io/promises/articles/shiny.html

EDIT: Code adapted from @ismirsehregal, refactored and now using futures to handle the parallel processing and async results.

library(shiny)
library(future)
plan(multiprocess)

ui <- fluidPage(
  titlePanel("title"),
  sidebarLayout(
    sidebarPanel(
      actionButton(inputId = "button", label = "make table")
    ),
    mainPanel(
      uiOutput("table")
    )
  )
)

makeTable <- function(nrow){
  filename <- tempfile()
  file.create(filename)
  future({
    for (i in 1:nrow) {
        # expensive operation here
        Sys.sleep(1)
        matrix(c(i, runif(10)), nrow = 1) %>%
        as.data.frame() %>%
        readr::write_csv(path = filename, append = TRUE)
    }
  })

  reactiveFileReader(intervalMillis = 100, session = NULL,
                     filePath = filename,
                     readFunc = readr::read_csv, col_names = FALSE)
}

server <- function(input, output, session){

  table_reader <- eventReactive(input$button, makeTable(10))
  output$table = renderTable(table_reader()()) # nested reactives, double ()
}

shinyApp(ui, server)

asachet
  • 6,620
  • 2
  • 30
  • 74
  • Using this solution the long running process will [block](https://github.com/rstudio/promises/issues/23#issuecomment-386687705) the current shiny session. – ismirsehregal May 23 '19 at 11:01
  • @ismirsehregal Not really. It only blocks while looping to create the futures. There can be a delay, especially in environment where forking is expensive. If you increase the sleep time, you'll see that the session is not blocked by the row processing. But you have a point: for better responsiveness, it is better to create a single future and loop in it. Although of course you're processing rows sequentially if you do that. – asachet May 23 '19 at 14:03
  • Sure, i'm fine with your answer after your edit. Anyway the sticking point for this question was using `reactiveFileReader` not which async-backend is used. – ismirsehregal May 23 '19 at 14:09
  • Just one more thing. [This](https://github.com/rstudio/promises/issues/23#issuecomment-386764021) indicates that `library(callr)` is far away from being “outdated” in the shiny context. And your downvote seems to be the result of personal preference (“shinyverse”) rather than a problem with the given answer. – ismirsehregal May 23 '19 at 15:21