0

I have been using R to scrape XML tables from a Microsoft Sharepoint page and I wish to use the 'rs:name' buried in the Schema as the names of each column, instead of the attribute names in rs:data. I am having trouble accessing these names as they are very deep in the XML tree.

The reason why I want these names is because they are the full names of the columns in the table on the Sharepoint page, not just the XML encoded names, and that when I load the data in, if there are any missing values in the table, entries will be moved across to fill them in, often wrapping back to the start.

Here is a link that I have been following for inspiration: Using R to connect to a sharepoint list

Here is an pretty similar example to the XML code (just with the names changed) https://pastebin.com/Ks2LmBS3

My code looks like:

page <- GET(url, verbose(), authenticate(username, password, type='ntlm'))
src <- httr::content(page)
src %>% xml_structure()
xmlData <- xmlParse(src, options=HUGE, useInternalNodes=TRUE)
dataList <- xmlToList(xmlRoot(xmlData)[["data"]])
dataMatrix <- do.call(rbind, dataList)
df <- data.table(dataMatrix)

however I wish to access the rs:name in Schema and use these as the column names, then populate the table with the remaining data.

Please let me know if there is anything you do not understand or need more explanation. Thank you very much for your help in advance!

  • I'm afraid the pastbin link isn't to XML but to — what looks like — partial XML copied from the IE/Edge XML viewer (the leading `-` before some `<` give it away) and even with those cleaned up the XML is still missing closing tags and won't validate/read in. – hrbrmstr Nov 29 '18 at 21:53
  • Yes very true, my apologies. I believe I have solved the problem with the following code: `url <- xmlParse('test.xml')` `doc <- xmlParse(url)` `column_names <- as.character(getNodeSet(doc, "//s:AttributeType/@rs:name"))` – Blake List Nov 29 '18 at 22:21

0 Answers0