I'm trying to parallelize the function below. I refactored this from a for
statement and implemented pmap
to speed up reading the xml data, which went well. The next bottleneck is in my keep
statement. How can I improve performance here?
I've tried (keep #(when (pmap #(later-date? (second %) after) zip) [(first %) (second %)]) zip)
but nested #()
functions are not allowed. I've also tried wrapping the keep
as well as the call to raw-url-data in a future
but dereferencing either in the calling function produces nil.
(defn- raw-url-data
"Parse xmlzip data and return a sequence of URLs/modtime vectors."
[data after]
(let [article (xz/xml-> data :url)
loc (pmap #(-> (xz/xml-> % :loc xz/text) first) article)
mod (pmap #(-> (xz/xml-> % :lastmod xz/text) first
parse-modtime) article)
zip (zipmap loc mod)]
(keep #(when (later-date? (second %) after)
[(first %) (second %)]) zip)))
And here is my later-date? function:
(defn- later-date?
"Return TRUE if DATETIME is after AFTER or either one is NIL."
[datetime after]
(or (nil? datetime)
(nil? after)
(time/after? datetime after)))