I'm parsing a Swedish library catalogue using R and the XML-package. Using the library's API, I'm getting XML back from a url containing my query.
I'd like to use xPath queries to parse each record, but everything I do with xPath of the XML-package returns blank lists, everything except "//*". I'm no expert in either xml-parsing nor xPath, but I suspect that it has to do with the xml that my API returns to me.
This is a simple example of one single post in the catalogue:
library(XML)
example.url <- "http://libris.kb.se/sru/swepub?version=1.1&operation=searchRetrieve&query=mat:dok&maximumRecords=1&recordSchema=mods"
doc = xmlParse(example.url)
# Title
works <- xmlRoot(doc)[[4]][["record"]][["recordData"]][["mods"]][["titleInfo"]][["title"]][[1]]
doesntwork <- getNodeSet(doc, "//title")
# The only xPath that returns anything
onlythisworks <- getNodeSet(doc, "//*")
If this has something to do with namespaces (as these answers sugests), all I understan about it is that the API returns data that seems to have namespaces defined in the initial tag, and that I could use that, but this doesn't help me:
# Namespaces are confusing:
title <- getNodeSet(xmlRoot(doc), "//xsi:title", namespaces = c(xsi = "http://www.w3.org/2001/XMLSchema-instance"))
Here's (again) the example return data that I'm trying to parse.