I'm parsing some XML data from Stack Exchange using clojure.data.xml
, for example if I parse Votes data it returns a LazySeq containing a HashMap for each row of data.
What I am trying to do is to get the values associated with only certain keys, for each row,e.g., (get votes [:Id :CreationDate])
. I've tried numerous things, most of them leading to casting errors.
The closest I could get to what I need is using (doall (map get votes [:Id :CreationDate]))
. However, the problem I am experiencing now is that I cannot seem to return more than just the first row (i.e. (1 2011-01-19T00:00:00.000)
)
Here is a MCVE that can be run on any Clojure REPL, or on Codepad online IDE.
Ideally I would like to return some kind of collection or map which contains the values I need for each row, the end goal is to write to something like a CSV file or such. For example a map like
(1 2011-01-19T00:00:00.000 2 2011-01-19T00:00:00.000 3 2011-01-19T00:00:00.000 4 2011-01-19T00:00:00.000)
(def votes '({:Id "1",
:PostId "2",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}
{:Id "2",
:PostId "3",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}
{:Id "3",
:PostId "1",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}
{:Id "4",
:PostId "1",
:VoteTypeId "2",
:CreationDate "2011-01-19T00:00:00.000"}))
(println (doall (map get votes [:Id :CreationDate])))
Additional detail: If this is of any help/interest, the code I am using to get the above lazy seq is as follows:
(ns se-datadump.read-xml
(require
[clojure.data.xml :as xml])
(def xml-votes
"<votes><row Id=\"1\" PostId=\"2\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /> <row Id=\"2\" PostId=\"3\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /> <row Id=\"3\" PostId=\"1\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /> <row Id=\"4\" PostId=\"1\" VoteTypeId=\"2\" CreationDate=\"2011-01-19T00:00:00.000\" /></votes>")
(defn se-xml->rows-seq
"Returns LazySequence from a properly formatted XML string,
which contains a HashMap for every <row> element with each of its attributes.
This assumes the standard Stack Exchange XML format, where a parent element contains
only a series of <row> child elements with no further hierarchy."
[xml-str]
(let [xml-records (xml/parse-str xml-str)]
(map :attrs (-> xml-records :content))))
; this returns a map identical as in the MCVE:
(def votes (se-xml->rows-seq xml-votes)