-2

I've been working on a RStudio to crawl some websites. I wanted to be able to run my code automatically at a particular instances during the day. I've been using Rcrawler and Rvest to crawl.

The point is to do news aggregation from several sites using different keywords at different times during the day. I'm trying to automate the process of running the particular script.

Is there a way to do so in R or should I move to Python for the same? I'm using Rstudio on Windows.

Megh
  • 81
  • 5

2 Answers2

0

There is an easy way to do this, though I'm not sure it's the right way. Use a loop and run the script in the background.

while(1){
  tnow<-format(Sys.time(),'%H:%M')
  tschedule<-'18:00'
  while(tnow<tschedule){
    Sys.sleep(60)
  }
  # Your code here
}

The start condition can be modified as needed, but you can see the idea

Rohit
  • 1,967
  • 1
  • 12
  • 15
0

You haven't mentioned your OS, but if you are on Linux / Unix then look into Cron. There is an r Package called CronR that allows you to schedule the running of a script at specific times or intervals (hourly, daily etc.) There is also an Rcommander plugin for a CronR GUI.

You need to install and start the cron service in Linux (i.e. using apt-get)

You can then write an R script to schedule a job;

library(cronR)
cmd <- cron_rscript("/home/job.R") #, log_append = TRUE)
cmd
cron_add(command = cmd, frequency = 'hourly', id = 'Scrape', description = 'Webscrape')
cron_njobs()

The best way to keep adding to an output is then using something like write table with append = TRUE

write.table(df, "Scrape.csv", sep = ",", col.names = F, append = T)

The job will run until you choose to end it, all it's doing is executing an entire r script, so all your writing to .csv etc needs to be in that script.

Permafrost
  • 151
  • 8