1

I have some tabular data in SharePoint and I'm trying to read it in R.

I tried a few options posted in Stack Overflow: Using R to connect to a sharepoint list, How to access SharePoint list from R? and Import file from Sharepoint to R

Option 1:

URL <- "https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx"
data = xmlParse(readLines(URL))

The results showed

Error: 1: Namespace prefix ie on menuitem is not defined 16: xmlParseEntityRef: no name 22: Specification mandate value for attribute nowr

Option 2

rawData <- getURL(URL, userpwd = "username:password")
xmlData <- xmlParse(rawData, options=HUGE, useInternalNodes=TRUE)

The results showed

Error: 1: EntityRef: expecting ';' 9: Namespace prefix ie on menuitem is not defined 24:

Option 3

sp_import <- function(ListName) {
urlstring <- "https://herbi.xxxx.xxxx.au/site/land/Lists/test_data"
data <- xmlParse(readLines(paste(urlstring, ListName, sep = ""), warn = FALSE))
items <- getNodeSet(data, "//m:properties")
df <- xmlToDataFrame(items, stringsAsFactors = FALSE)
iterate <- nrow(df)
skip <- 1
while (nrow(df) == 1000 * skip) {
data <- xmlParse(readLines(paste(urlstring, ListName, "?$top=1000&$skip=", iterate, sep = ""), warn = FALSE))
items <- getNodeSet(data, "//m:properties")
df <- rbind(df, xmlToDataFrame(items, stringsAsFactors = FALSE))
iterate <- nrow(df)
skip <- skip + 1
}
return(df)
}
test = sp_import("AllItems.aspx")

The results showed

Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : Error in file(con, "r") : cannot open the connection

Option 4 How to access SharePoint list from R?

library(rvest)
webpage = read_html("https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx")

Error in open.connection(x, "rb") : HTTP error 401. which is an authentication error.

Option 5 Import file from Sharepoint to R

library(httr)
test <- GET("https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx", authenticate("username", "password", "ntlm"))

library(RCurl)
sharepoint_data <- read.table(
  text = getURL(
    "https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx",
    userpwd = paste0("username", ":", "password")
  ),
  header = TRUE, 
  sep = "," 
)

But I end up pulling a data frame.

head(sharepoint_data,4)

    X..DOCTYPE.html.PUBLIC....W3C..DTD.XHTML.1.0.Strict..EN
1                                                                                                                                                                                                                                                                                                                                      \thttp://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd>
2                                                                                                                                                                                                                                                                                                                                                 <html dir=ltr class=ms-isBot lang=en-NZ>
3 <head id=ctl00_Head1><meta name=GENERATOR content=Microsoft SharePoint /><meta http-equiv=Content-type content=text/html; charset=utf-8 /><meta http-equiv=X-UA-Compatible content=IE=10 /><meta http-equiv=Expires content=0 /><meta name=msapplication-TileImage content=/_layouts/15/images/SharePointMetroAppTile.png /><meta name=msapplication-TileColor content=#34a6e8 /><title>
4                                                                                                                                                                                                                                                                                                                                                     \tFEMP 2021 Resubmissions - All Items

tail(sharepoint_data, 4)
                                        X..DOCTYPE.html.PUBLIC....W3C..DTD.XHTML.1.0.Strict..EN
1483                                                                                    </script>
1484     <script type=text/javascript src=/_layouts/15/ILDS.Branding.HBRC/masterpage.js></script>
1485                                                                                      </body>
1486                                                                                      </html>

Can anyone tell what I've done wrong? Thanks.

M八七
  • 233
  • 4
  • 11
  • 1
    Sorry, I can't resist...... you pulled the _whole_ web as a dataframe? That is quite the achievement. – QHarr May 11 '21 at 23:38
  • 1&2 - are you missing a namespace `ie` that has to be [added](https://stackoverflow.com/questions/43553658/parse-xml-using-r-having-namespaces)? 3 - there is no sign of that variable being assigned within the code you show 4) You pass no authentication - leave this rvest method 5) Seems to be two different attempts of which we don't know what happened with httr though looks like promising start; the RCurl appears to have successfully authenticated, I am unsure what you mean by `whole` web, however, read.table is intended to work with entire file ?> `Reads a file in table format` – QHarr May 11 '21 at 23:51
  • ....[Each row of the table appears as one line of the file](https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table) – QHarr May 11 '21 at 23:54
  • @QHarr Thanks, 1&2 - I'm not sure where to add namespace `ie` and what that means. 3 please find my edited code. 4 - Not sure where to add authentication. Sorry not `whole` web, please find my added info. – M八七 May 12 '21 at 04:21

0 Answers0