0

I am trying to fetch the text data(eg. dealer type, dealer name) from google map URL. I have written following code for this:

library(RSelenium)
library(XML)
library(xlsx)
test<-read.xlsx("C:\\Selenium_Tool\\Segmentaion_Files\\test.xlsx",1)
test$addr<-paste(test$CUST_ADDRESS,test$CUST_CITY, test$CUST_STATE,sep = 
",")
test$URL<-paste("https://www.google.co.in/maps/place/",test$addr)
View(test)
rd<-rsDriver(port = 4567L, browser = c("chrome", "firefox", "phantomjs", 
"internet explorer"), version = "latest", chromever = "latest", geckover = 
"latest", iedrver = NULL, phantomver = "2.1.1", verbose = TRUE, check = 
TRUE)
remDr <- rd[["client"]]
for(i in 1:length(test$CUST_ID)) {
remDr$navigate(test$URL[i])
webElem<-remDr$findElements(using = 'class','section-listbox')
elem<-webElem[1]
class(elem)
test$result[i]<-elem$getElementText()[[1]]

}
remDr$close()
rd[["server"]]$stop()

In test.xlsx, I have few google map URLs. when I am trying to run this code for 100 URL's, it is working fine but when I am running this for more than 100 URL's, it is giving me error stating: Error in webElem[[1]] : subscript out of bounds

Please help me to sort this out.

  • using this code, I am getting all the text data from that particular URL while I want only a specific data. but for now, my main concern is about only running code smoothly... – Sourabh Bakliwal Dec 12 '17 at 11:58
  • 2
    You should consider using the "appropriate" Google API. Scraping is against ToS anyway but at least you won't be rate limited for anything under several thousand requests. – bmrn Dec 12 '17 at 12:30
  • @bmrn but there is a possibility of the ip getting blocked if sends multiple requests in shorter duration – amrrs Dec 12 '17 at 12:34
  • @amrrs this possibility still exists outside of the API. In fact it may be the cause of the authors problem. – bmrn Dec 12 '17 at 12:49
  • but I am just trying to get their shop type(e.g car dealer, wine shop etc). So I don't think I really need to add API thing here. what do you say? – Sourabh Bakliwal Dec 13 '17 at 12:06
  • The feature request to make this info available via API: https://issuetracker.google.com/issues/35822953 – xomena Dec 14 '17 at 12:13

0 Answers0