0

I have a problem when extracting attributes from xml to R. I have the xml file as follow:

- <export>
  + <ExportRef>
  - <BookNodes>
      - <Book label="romance">
        + <Showing>
        - <Data>
             + <Char1 label="Char1">
             - <Char2 label="Char2">
                   + <SubChar21>
                   - <SubChar22>
                        <Range unit="nm">4</Range>
                        <Range unit="nm">8</Range>
                     </SubChar22>
             - <Char3 label="Char3">
                   + <SubChar31>
                   - <SubChar32>
                        <Range Id="1">voc</Range>
                        <Range Id="2">buc</Range>
                     </SubChar32>
          </Data>
      </Book>
      - <Book label="horror">
        + <Showing>
        - <Data>
             + <Char1 label="Char1">
             - <Char2 label="Char2">
                   + <SubChar21>
                   - <SubChar22>
                        <Range unit="nm">4</Range>
                        <Range unit="nm">8</Range>
                     </SubChar22>
             - <Char3 label="Char3">
                   + <SubChar31>
                   - <SubChar32>
                        <Range Id="1">voc</Range>
                        <Range Id="2">buc</Range>
                     </SubChar32>
          </Data>
      </Book>
    </BookNodes>
 </export>

I would like to have a list of the Range Id only for each book categories. For example:

romance:

id id
1  2

horror:

id id
1  2

When I do something like that:

RangeID_1<-xpathSApply(AC_Node[[1]][[2]], ".//Range", xmlAttrs)

I get:

unit unit  id  id
"nm"  "nm" "1"  "2"

How to say to R that I only want the Range Id and not the Range unit?

Thank you very much!!

MVachelard
  • 59
  • 4
  • 2
    that is not an XML file. that is a text copy of an XML file from an XML viewer that allows node expansion. Nobody in their right mind is going to edit that block for you to make it legal XML. – hrbrmstr Feb 17 '17 at 14:39
  • Also please provide minimal but complete reproducible code which including all library statements and code to input the file. – G. Grothendieck Feb 17 '17 at 15:26

1 Answers1

0

My two cents with rvest:

library(rvest)
read_xml("your_xml_file.xml") %>% 
  xml_nodes("Range") %>% 
  xml_attr("Id")
denrou
  • 630
  • 3
  • 12