0

The below is the xml response i got from the sharepoint I am trying to parse the data and get details in the below format

Output Needed

title port space    datecreat               id
test  8080 100.000 2017-04-21 17:29:23      1
apple  8700 108.000 2017-04-21 18:29:23     2

Input Received

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">
            <GetListItemsResult>
                <listitems xmlns:s='uuid:SBDSHDSH-DSJHD' xmlns:dt='uuid:CSDSJHA-DGGD' xmlns:rs='urn:schemas-microsoft-com:rowset' xmlns:z='#RowsetSchema'
                    <rs:data ItemCount="2">
                        <z:row title="test" port="8080" space='100.000' datecreat='2017-04-21 17:29:23' id='1' />
                        <z:row title="apple" port="8700" space='108.000' datecreat='2017-04-21 17:29:23' id='2' />
                    </rs:data>
                </listitems>
            </GetListItemsResult>
        </GetListItemsResponse>
    </soap:Body>
</soap:Envelope>

I am new to R and tried few and none worked .The namespaces and z:row is unable to be detected.

alistaire
  • 42,459
  • 4
  • 77
  • 117
Rafa
  • 487
  • 7
  • 22

3 Answers3

1

Assuming the text is in Lines, one way is just to grep out the z:row lines, replace equal signs with spaces and read that using read.table. The first line reads the rows including some junk columns and the second line removes the junk columns and sets the column names. Note that this will work even if the XML is invalid. No packages are used.

DF <- read.table(text = gsub("=", " ", grep("z:row", Lines, value = TRUE)))
setNames(DF[seq(3, ncol(DF), 2)], unlist(DF[1, seq(2, ncol(DF)-2, 2)]))

giving:

  title port space           datecreat id
1  test 8080   100 2017-04-21 17:29:23  1
2 apple 8700   108 2017-04-21 17:29:23  2

Note: The input is assumed to be:

Lines <- c(" <?xml version=\"1.0\" encoding=\"utf-8\"?>", "        <soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">", 
"            <soap:Body>", "                <GetListItemsResponse xmlns=\"http://schemas.microsoft.com/sharepoint/soap/\">", 
"                    <GetListItemsResult>", "                            <listitems xmlns:s='uuid:SBDSHDSH-DSJHD' xmlns:dt='uuid:CSDSJHA-DGGD' xmlns:rs='urn:schemas-microsoft-com:rowset' xmlns:z='#RowsetSchema'", 
"                                <rs:data ItemCount=\"2\">", 
"                                    <z:row title=\"test\" port=\"8080\" space='100.000' datecreat='2017-04-21 17:29:23' id='1' />", 
"                                    <z:row title=\"apple\" port=\"8700\" space='108.000' datecreat='2017-04-21 17:29:23' id='2' />", 
"                            </rs:data>", "                        </listitems>", 
"                    </GetListItemsResult>", "                </GetListItemsResponse>", 
"            </soap:Body>", "        </soap:Envelope>")

If, instead, your input were one long newline-separated string called Lines_n, say, then run this first:

Lines <- readLines(textConnection(Lines_n))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Consider registering the z namespace prefix and use XML's internal variable xmlAttrsToDataframe using the triple colon operator:

library(XML)

txt='<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <soap:Body>
  <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">
    <GetListItemsResult>
      <listitems xmlns:s=\'uuid:SBDSHDSH-DSJHD\' xmlns:dt=\'uuid:CSDSJHA-DGGD\' xmlns:rs=\'urn:schemas-microsoft-com:rowset\' xmlns:z=\'#RowsetSchema\'>
        <rs:data ItemCount="2">
          <z:row title="test" port="8080" space=\'100.000\' datecreat=\'2017-04-21 17:29:23\' id=\'1\' />
          <z:row title="apple" port="8700" space=\'108.000\' datecreat=\'2017-04-21 17:29:23\' id=\'2\' />
        </rs:data>
      </listitems>
    </GetListItemsResult>
  </GetListItemsResponse>
 </soap:Body>
</soap:Envelope>'

doc <- xmlParse(txt)

namespaces <- c(z="#RowsetSchema")
df <- XML:::xmlAttrsToDataFrame(getNodeSet(doc, path='//z:row', namespaces))

df
#   title port   space           datecreat id
# 1  test 8080 100.000 2017-04-21 17:29:23  1
# 2 apple 8700 108.000 2017-04-21 17:29:23  2
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

That's not valid XML and, while I'm the first person to complain about SharePoint, it — by itself — won't generate something that broken. It's totally possible that a colleague who is whacking at your SharePoint server broke something but it's really hard to break it this badly.

Anyway, this is a valid version of the XML:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">
            <GetListItemsResult>
                <listitems xmlns:s='uuid:SBDSHDSH-DSJHD' xmlns:dt='uuid:CSDSJHA-DGGD' xmlns:rs='urn:schemas-microsoft-com:rowset' xmlns:z='#RowsetSchema'>
                    <rs:data ItemCount="2">
                        <z:row title="test" port="8080" space='100.000' datecreat='2017-04-21 17:29:23' id='1' />
                        <z:row title="apple" port="8700" space='108.000' datecreat='2017-04-21 17:29:23' id='2' />
                    </rs:data>
                </listitems>
            </GetListItemsResult>
        </GetListItemsResponse>
    </soap:Body>
</soap:Envelope>

And, it parses & extracts fine with:

library(xml2)

doc <- read_xml("test.xml")

ns <- xml_ns_rename(xml_ns(doc), d1 = "a")

xml_find_all(doc, ".//z:row") %>% 
  map(xml_attrs) %>% 
  map_df(as.list) 

## # A tibble: 2 × 5
##   title  port   space           datecreat    id
##   <chr> <chr>   <chr>               <chr> <chr>
## 1  test  8080 100.000 2017-04-21 17:29:23     1
## 2 apple  8700 108.000 2017-04-21 17:29:23     2
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • the xml2 library is still not available in our servers and takes it won time to get it installed. But it definitely worked on the local system. Thanks for the response – Rafa Apr 23 '17 at 03:47
  • Why was it necessary to rename ``d1`` namespace to ``a``? – runr May 08 '19 at 13:34