3

I'm trying to crawl data from a website using RSelenium. The scrip below runs perfectly in RStudio

Code

library(RSelenium)   
library(rvest)    
library(xlsx)    
library(XML)    
library(RODBC)    
library(taskscheduleR)    
library(DBI)    
con <- dbConnect(odbc::odbc(), .connection_string = "Driver={ODBC Driver 11 for SQL Server};server=HG-SOS-MI;database=Data_Testing;trusted_connection=yes")
driver<- rsDriver()    
browser <- driver[["client"]]    
browser$navigate("www.goal.com/en-us")    
browser$maxWindowSize()

Error Log

But when I schedule it using the built-in taskScheduleR, I get the following in the error log:

Loading required package: methods
Warning message:
package 'DBI' was built under R version 3.4.4 
Loading required package: xml2
Loading required package: rJava
Loading required package: xlsxjars
Warning messages:
1: package 'xlsx' was built under R version 3.4.3 
2: package 'rJava' was built under R version 3.4.3 

Attaching package: 'XML'

The following object is masked from 'package:rvest':

xml

Warning message:
package 'taskscheduleR' was built under R version 3.4.3 
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking phantomjs versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD

Error in subprocess::spawn_process(tfile, ...) : 
  could not create process: Access is denied
Calls: rsDriver ... spawn_tofile -> windows_spawn_tofile -> <Anonymous> -> .Call
Execution halted

sessionInfo()

R version 3.5.0 (2018-04-23)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] subprocess_0.8.2  odbc_1.1.6        shiny_1.1.0       taskscheduleR_1.1 RODBC_1.3-15     
 [6] stringr_1.3.1     rlist_0.4.6.1     plyr_1.8.4        rvest_0.3.2       xml2_1.2.0       
[11] RSelenium_1.7.1   XML_4.0-0         RPostgreSQL_0.6-2 DBI_1.0.0        

loaded via a namespace (and not attached):
 [1] promises_1.0.1    bitops_1.0-6      bit_1.1-14        pkgconfig_2.0.1   blob_1.1.1       
 [6] compiler_3.5.0    xtable_1.8-2      wdman_0.2.2       Rcpp_0.12.17      httr_1.3.1       
[11] tools_3.5.0       openssl_1.0.1     R6_2.2.2          semver_0.2.0      assertthat_0.2.0 
[16] curl_3.2          digest_0.6.15     mime_0.5          miniUI_0.1.1.1    stringi_1.1.7    
[21] caTools_1.17.1    htmltools_0.3.6   hms_0.4.2         bit64_0.9-7       data.table_1.11.4
[26] httpuv_1.4.4.1    binman_0.1.0      rlang_0.2.1       magrittr_1.5      rappdirs_0.3.1   
[31] yaml_2.1.19       later_0.7.3       jsonlite_1.5     

error in docker

> rD <- rsDriver()

checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking phantomjs versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD

[1] "Connecting to remote server"

Selenium message:unknown error: DevToolsActivePort file doesn't exist
wibeasley
  • 5,000
  • 3
  • 34
  • 62
donmoy
  • 61
  • 5
  • which version of do you have? – Mislav Jun 25 '18 at 09:26
  • @Mislav I use R version 3.4.2 (2017-09-28) – donmoy Jun 25 '18 at 11:06
  • than you have to upgrade, best to 3.5 – Mislav Jun 25 '18 at 12:49
  • @Mislav I've upgrade to R version 3.5.0 and RStudio Version 1.1.453. However, the error in the log is still the same. – donmoy Jun 25 '18 at 13:38
  • can you provide output of `sessionInfo()`? – Mislav Jun 25 '18 at 19:24
  • @Mislav I've added the output in the question as requested – donmoy Jun 26 '18 at 09:08
  • It is recommended to run RSelenium through Docker or using wdman package. Have you tryed these? – Mislav Jun 26 '18 at 09:46
  • I've tried docker and I'm getting the error I posted above. I think it has to do with the chromedriver but I can't seem to fix it. – donmoy Jun 27 '18 at 08:14
  • can you try `chrome_driver <- wdman::chrome() driver <- remoteDriver(browserName = "chrome", port = 4567L) driver$open()` – Mislav Jun 27 '18 at 09:29
  • chrome_driver <- wdman::chrome() checking chromedriver versions: BEGIN: PREDOWNLOAD BEGIN: DOWNLOAD BEGIN: POSTDOWNLOAD > driver <- remoteDriver(browserName = "chrome", port = 4567L) > driver$open() [1] "Connecting to remote server" Selenium message:unknown error: DevToolsActivePort file doesn't exist (Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 4.9.87-linuxkit-aufs x86_64) – donmoy Jun 27 '18 at 10:16
  • I'm also thinking, maybe it can't locate the chromedriver path. Maybe I need to specify this. I'm new to docker. Struggling to locate the directories – donmoy Jun 27 '18 at 10:26
  • Have you added chrome driver to environment path? – Mislav Jun 27 '18 at 10:27
  • rstudio@f7953541190a:~$ ls /home/rstudio/bin chromedriver chromedriver.exe phantomjs I've added this to my working directory and also the "C:\Program Files\Docker\Docker\resources\bin" but error still persists. – donmoy Jun 27 '18 at 11:09

1 Answers1

3

I also experience that problem and solve that problem with "window task scheduler".

1.First, create schedule using taskscheduler_create function.

Jobplanet <- "Directory where your R script located."
taskscheduler_create(taskname = "R03_Jobpnanet_Review_Crawling", rscript = Jobplanet, 
                 schedule = "MONTHLY", starttime = "17:20", startdate = format(Sys.Date(), "%Y/%m/%d"),
                 days = 31)

2.And then, open the window task scheduler and double click the created task. enter image description here

3.Check "Run whether user is logged on or not" and change "Configure for" option as "Windows7, Windows Server 2008 R2" enter image description here

Please try. It works well to me.

wibeasley
  • 5,000
  • 3
  • 34
  • 62
서한솔
  • 128
  • 1
  • 7