I have some tabular data in SharePoint and I'm trying to read it in R.
I tried a few options posted in Stack Overflow: Using R to connect to a sharepoint list, How to access SharePoint list from R? and Import file from Sharepoint to R
Option 1:
URL <- "https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx"
data = xmlParse(readLines(URL))
The results showed
Error: 1: Namespace prefix ie on menuitem is not defined 16: xmlParseEntityRef: no name 22: Specification mandate value for attribute nowr
Option 2
rawData <- getURL(URL, userpwd = "username:password")
xmlData <- xmlParse(rawData, options=HUGE, useInternalNodes=TRUE)
The results showed
Error: 1: EntityRef: expecting ';' 9: Namespace prefix ie on menuitem is not defined 24:
Option 3
sp_import <- function(ListName) {
urlstring <- "https://herbi.xxxx.xxxx.au/site/land/Lists/test_data"
data <- xmlParse(readLines(paste(urlstring, ListName, sep = ""), warn = FALSE))
items <- getNodeSet(data, "//m:properties")
df <- xmlToDataFrame(items, stringsAsFactors = FALSE)
iterate <- nrow(df)
skip <- 1
while (nrow(df) == 1000 * skip) {
data <- xmlParse(readLines(paste(urlstring, ListName, "?$top=1000&$skip=", iterate, sep = ""), warn = FALSE))
items <- getNodeSet(data, "//m:properties")
df <- rbind(df, xmlToDataFrame(items, stringsAsFactors = FALSE))
iterate <- nrow(df)
skip <- skip + 1
}
return(df)
}
test = sp_import("AllItems.aspx")
The results showed
Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : Error in file(con, "r") : cannot open the connection
Option 4 How to access SharePoint list from R?
library(rvest)
webpage = read_html("https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx")
Error in open.connection(x, "rb") : HTTP error 401. which is an authentication error.
Option 5 Import file from Sharepoint to R
library(httr)
test <- GET("https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx", authenticate("username", "password", "ntlm"))
library(RCurl)
sharepoint_data <- read.table(
text = getURL(
"https://herbi.xxxx.xxxx.au/site/land/Lists/test_data/AllItems.aspx",
userpwd = paste0("username", ":", "password")
),
header = TRUE,
sep = ","
)
But I end up pulling a data frame.
head(sharepoint_data,4)
X..DOCTYPE.html.PUBLIC....W3C..DTD.XHTML.1.0.Strict..EN
1 \thttp://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd>
2 <html dir=ltr class=ms-isBot lang=en-NZ>
3 <head id=ctl00_Head1><meta name=GENERATOR content=Microsoft SharePoint /><meta http-equiv=Content-type content=text/html; charset=utf-8 /><meta http-equiv=X-UA-Compatible content=IE=10 /><meta http-equiv=Expires content=0 /><meta name=msapplication-TileImage content=/_layouts/15/images/SharePointMetroAppTile.png /><meta name=msapplication-TileColor content=#34a6e8 /><title>
4 \tFEMP 2021 Resubmissions - All Items
tail(sharepoint_data, 4)
X..DOCTYPE.html.PUBLIC....W3C..DTD.XHTML.1.0.Strict..EN
1483 </script>
1484 <script type=text/javascript src=/_layouts/15/ILDS.Branding.HBRC/masterpage.js></script>
1485 </body>
1486 </html>
Can anyone tell what I've done wrong? Thanks.