Web-scraping in r. How to scrape the data from ("+More" etc).?

Question

Suppose I want to get information about Amenities from this webpage (https://www.airbnb.com/rooms/6676364). It works ok only for visible part. But how to extract the rest from "+More" button?

I tried the node from "source code" with the help of xpathSApply, but it returns me "+more". Do you know the solution of this problem?

My RSelenium approach:

url <- "https://www.airbnb.com/rooms/12344760"
library('RSelenium')
pJS <- phantom()
library('XML')
shell.exec(paste0("C:\\Users\\Daniil\\Desktop\\R-language,Python\\file.bat"))
Sys.sleep(10)

checkForServer()
startServer()
remDr <- remoteDriver(browserName="chrome", port=4444)
remDr$open(silent=T)

remDr$navigate(url)
var <- remDr$findElement('id','details') ### extracting all table###

vartxt <- var$getElementAttribute("outerHTML")[[1]]
varxml <- htmlParse(vartxt, useInternalNodes=T)
Amenities <- xpathSApply(varxml,"//div[@class = expandable-content expandable-content-full']",xmlValue)

Also doesn't work

You can use [RSelenium](https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-basics.html) to be able to interact with the page i.e clicking on '+More' link to display complete list of Amenities ... Then you can pass the source from RSelenium to xpathSApply, if you like — har07, Jun 08 '16 at 09:57
I have tried also this approach, but for me It doesn't work also. Can you provide a little code, if it is possible of course? — YNWA1992, Jun 08 '16 at 10:05
I have posted an answer explaining the steps I would do to approach this problem, even though I'm not used to code in R (only tried RSelenium [once](http://stackoverflow.com/questions/29713443/scraping-data-from-tripadvisor-using-r/29713938#29713938)) — har07, Jun 08 '16 at 10:22
Perhaps we shouldn't be asking others to co-violate [Terms of Service](https://www.airbnb.com/terms). Just b/c you can do something does not mean you should do it. — hrbrmstr, Jun 08 '16 at 10:40

score 1 · Answer 1 · answered Jun 08 '16 at 10:19

1

After you navigate the RSelenium driver to the target URL, use the following XPath to find <a> element where inner text equals '+ More' within amenities <div> :

remDr$navigate(url)
link <- remDr$findElement(using = 'xpath', "//div[@class='row amenities']//a[.='+ More']")

Then perform click on the link to get complete list of amenities :

link$clickElement()

Lastly, pass current page HTML source to whatever R function you want to use for further processing :

doc <- htmlParse(remDr$getPageSource()[[1]])
....

answered Jun 08 '16 at 10:19

har07

88,338
12
84
137

After your first step it shows me an error: Summary: NoSuchElement Detail: An element could not be located on the page using the given search parameters. class: org.openqa.selenium.NoSuchElementException – YNWA1992 Jun 08 '16 at 10:30
I tried it with //span it doesn't work. Here is the line: + More – YNWA1992 Jun 08 '16 at 10:39

Web-scraping in r. How to scrape the data from ("+More" etc).?

1 Answers1