1

Suppose I want to get information about Amenities from this webpage (https://www.airbnb.com/rooms/6676364). It works ok only for visible part. But how to extract the rest from "+More" button?

I tried the node from "source code" with the help of xpathSApply, but it returns me "+more". Do you know the solution of this problem?

My RSelenium approach:

url <- "https://www.airbnb.com/rooms/12344760"
library('RSelenium')
pJS <- phantom()
library('XML')
shell.exec(paste0("C:\\Users\\Daniil\\Desktop\\R-language,Python\\file.bat"))
Sys.sleep(10)

checkForServer()
startServer()
remDr <- remoteDriver(browserName="chrome", port=4444)
remDr$open(silent=T)

remDr$navigate(url)
var <- remDr$findElement('id','details') ### extracting all table###

vartxt <- var$getElementAttribute("outerHTML")[[1]]
varxml <- htmlParse(vartxt, useInternalNodes=T)
Amenities <- xpathSApply(varxml,"//div[@class = expandable-content expandable-content-full']",xmlValue)

Also doesn't work

YNWA1992
  • 99
  • 1
  • 10
  • You can use [RSelenium](https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-basics.html) to be able to interact with the page i.e clicking on '+More' link to display complete list of Amenities ... Then you can pass the source from RSelenium to xpathSApply, if you like – har07 Jun 08 '16 at 09:57
  • I have tried also this approach, but for me It doesn't work also. Can you provide a little code, if it is possible of course? – YNWA1992 Jun 08 '16 at 10:05
  • I have posted an answer explaining the steps I would do to approach this problem, even though I'm not used to code in R (only tried RSelenium [once](http://stackoverflow.com/questions/29713443/scraping-data-from-tripadvisor-using-r/29713938#29713938)) – har07 Jun 08 '16 at 10:22
  • 1
    Perhaps we shouldn't be asking others to co-violate [Terms of Service](https://www.airbnb.com/terms). Just b/c you can do something does not mean you should do it. – hrbrmstr Jun 08 '16 at 10:40

1 Answers1

1

After you navigate the RSelenium driver to the target URL, use the following XPath to find <a> element where inner text equals '+ More' within amenities <div> :

remDr$navigate(url)
link <- remDr$findElement(using = 'xpath', "//div[@class='row amenities']//a[.='+ More']")

Then perform click on the link to get complete list of amenities :

link$clickElement()

Lastly, pass current page HTML source to whatever R function you want to use for further processing :

doc <- htmlParse(remDr$getPageSource()[[1]])
....
har07
  • 88,338
  • 12
  • 84
  • 137
  • After your first step it shows me an error: Summary: NoSuchElement Detail: An element could not be located on the page using the given search parameters. class: org.openqa.selenium.NoSuchElementException – YNWA1992 Jun 08 '16 at 10:30
  • I tried it with //span it doesn't work. Here is the line: + More – YNWA1992 Jun 08 '16 at 10:39