0

I am using RSelenium for web scraping. Now I have an xpath of a certain XML node from a dynamically generated web page. The child nodes are of the same kind. However, I have no a priori knowledge about the number of child nodes. (For instance, when you search for a rare item on a shopping website, you may run into this kind of situation.)

In general, how can I obtain the following information?

1) The numbers of a node's child nodes. 2) The xpath(s) of above. My goal is to apply actions throughout each child nodes (e.g. fill, check or click, depend on what kind the node is).

I see some xpaths using xpath helper in chrome. Then I am completely stuck.

Preferably exemplified using RSelenium. httr + rvest is also acceptable.

Bill Huang
  • 4,491
  • 2
  • 13
  • 31

1 Answers1

1

A rvest solution would be the following:

require(rvest)
your_xpath = "YOUR XPATH"
doc <- read_html(remDr$getPageSource()[[1]])
children <- doc %>% html_node(xpath=your_xpath) %>% html_children()

Then you can iterate over the children and to to them whatever you like

for (i in 1:length(children)){
  webElem <- remDr$findElement(using = 'xpath', sprintf("%s/*[%d]", your_xpath, i))
  if(classify_node(children[i]) == "click"){
    webElem$$clickElement()
  } else {...}
}
Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • Where does classify_node come from? Dont find it anywhere and google brings me back here – MLEN Nov 19 '18 at 21:16