0

Can I get the text between " " using RSelenium?

<note day="12" month="11" year="2002"
to="Tove" from="Jani" heading="Reminder"
body="Don't forget me this weekend!">
</note>

For example: The "12" from the day attribute or the "11" from the month attribute.

Thank you!

pogibas
  • 27,303
  • 19
  • 84
  • 117

1 Answers1

1

RSelenium is more a tool for getting content from dynamic websites. You could parse the content using rvest.

To get all attributes, use xml_attrs(). Assuming your XML is saved to a file named "mydata.xml":

library(rvest)
read_xml("mydata.xml") %>% 
  xml_nodes(xpath = "//note") %>% 
  xml_attrs()

[[1]]
                            day                           month                            year 
                           "12"                            "11"                          "2002" 
                             to                            from                         heading 
                         "Tove"                          "Jani"                      "Reminder" 
                           body 
"Don't forget me this weekend!" 

Use xml_attr() for individual attributes:

read_xml("mydata.xml") %>% 
  xml_nodes(xpath = "//note") %>% 
  xml_attr("day")

[1] "12"
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Thank you for the response, but I'm using it in a dynamic web page. When using rvesti get: Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : AttValue: " or ' expected [39] – Jorge Ruben Acedo Glaros Jan 30 '18 at 20:33
  • OK, my answer stands, you'd use `RSelenium` to get the content, `rvest` to parse it. We need to see fully reproducible code to debug your error. – neilfws Feb 02 '18 at 03:43