0

I have an R script that writes a large amount of data querying off of a database. The script queries dataframes (1M - 10M rows) and then processes the data and then writes CSV with the output.

It keeps crashing with this message:

The previous R session was abnormally treminated due to an unexpected crash.

I'd like to make it more robust to crashing, or find out why it's crashing. I put all the data querying, munging and writing into a single function, Since it keeps crashing, I've been restarting it by hand each time.

Are there ways I can automatically retry the script after it scripts? Or, are there bigger design challenges that I should be taking on?

Edit--

I updated the original script from using tibbles to data tables. It's much faster and crashes less frequently, but it continues to crash. I'd like to set up logic like this:

  1. Start running a particular function in the script and keep running it
  2. If it crashes, try again beginning with the most recent successful "start date" (an argument in the function)
  3. Check if it has crashed every 5-10 minutes

Can I accomplish this with Plumbr?

Cauder
  • 2,157
  • 4
  • 30
  • 69
  • 4
    _"I put all the data querying, munging and writing into a single function"_ - that makes things much harder to debug. A better design is to write lots of little functions that do one thing each. These can all be called by a larger co-ordinating function for ease of use later once you are happy each little function does what it is supposed to do reliably. It is very likely that a single part of the function isn't working as expected and could be protected by a `tryCatch`, but right now your question is too broad to give a definitive answer. – Allan Cameron Sep 07 '20 at 08:17
  • 2
    I would suggest to figure out why your script terminates the R session. Are there any error messages? You can add a `browser` call in your script and go through it step by step. – Paul van Oppen Sep 07 '20 at 08:20
  • I don't get any error messages and it works when I do it with just one step and then crashes after it finishes running through the script four or five times. Ideally, I'd like the script to run 40 times – Cauder Sep 07 '20 at 08:21
  • 1
    Does the crash depend on the size of the data set? Using e.g. `pryr` you can detect the memory size of your object(s). Does the script crash during the analysis phase or during `write.csv`? – Paul van Oppen Sep 07 '20 at 08:23
  • It keeps crashing on this function ` dataframe %>% left_join(dataframe_three, by = "key_a") %>% group_by(key_a, date.x) %>% summarise(key_z = key_z[which.min(abs(date.x - date.y))]) %>% arrange(date.x) %>% rename(day = date.x)` – Cauder Sep 07 '20 at 09:16
  • Could you share a small version of your data set (use e.g. `dput(head(data, 25))`) and your code? Edit your question and add the data. That will allow us to run your code with your data and offer some help if possible. – Paul van Oppen Sep 07 '20 at 21:47
  • I don't want this to become a duplicate of this question https://stackoverflow.com/questions/63774476/what-are-helpful-optimizations-in-r-for-big-data-sets/63775771#63775771 but the data is the same – Cauder Sep 07 '20 at 21:48
  • I updated the description, thanks – Cauder Sep 08 '20 at 02:25
  • my guess would be that you have a 'many-to-many' join, which cause you computer to run out of memory. Maybe check if there are multiple duplicates of 'key_a' in your dataframes 'dataframe' and 'dataframe_three'. You can add the argument na_matches = "never", to ensure that you dont join via NAs. good luck! – Jagge Sep 08 '20 at 09:59

0 Answers0