I've been using docjure to write to excel files. Mostly I want to append rows to already existing files, usually one at a time. When I do this without agents/future, I load the file, use add-rows to add the data, and then rewrite the file like this:
(defn append [filename data]
"data is in the same format as create-workbook, i.e. [[\"n\" \"m\"] [1 2] [3 4]]"
(let [workbook (load-workbook filename))
sheet (select-sheet workbook "Sheet1")]
(add-rows! sheet data)
(save-workbook! filename workbook)))
I make a lot of calls to append, so I found this: http://blakesmith.me/2012/05/25/understanding-clojure-concurrency-part-2.html, which shows you how to use agents to write to a file using future.
First of all, I'm using FileOutputStream instead of FileWriter, which would still work, except whereas in the tutorial's example you just append strings to the end of the file using .write and then close, I need to rewrite the file every time I "append" (I think?) since there's more bytes in a .xlsx workbook than just characters.
I don't really know how to set this up since with the tutorial's logging example, write-out returns the updated instance of the BufferedWriter and I don't know what the equivalent of that would be.
My other option would be to add the data to the vector concurrently (load the file once and keep returning new vectors [[\"n\" \"m\"] [1 2] [3 4]] with the data added) but I'm planning on doing ~10000-100000 of these calls and that seems like a lot to keep track of... although to be fair reading and writing all the data that many times is probably not that great either.
If you have any suggestions on how I can do this, I'd appreciate it. I'd be willing to make calls to the Apache POI itself too, if there's a better way to append with that. Thanks.
--- UDPATE ---
I just rewrote the the logger example with the file as an agent instead of the output stream and it seems to work. I'll let you know if it ends up working with docjure/Apache POI.
(def logfile (agent (File. "blah.txt")))
(defn write-out [file msg]
(with-open [out (BufferedWriter. (FileWriter. file true))]
(.write out msg))
file)
--- UDPATE 2---
I got an analogous version written with docjure, but the unfortunately because opening the file happens within write-out and that happens during each future (I don't see a way around this if I use File as an agent, and I don't see another way to do it besides that) most of them read the empty file and write the row to that since they're all done in parallel and the end result is that most of them overwrite each other.
Ultimately I decided to just add each row vector to an overall data vector and write once. I can do that with just pmap, so its a lot neater. The one downside is if something goes wrong none of the data is written to the file at all, but the upside is that the time it takes to write is reduced since there's only one write call. Also, I would have been loading the large amount of data into memory every time which takes time. Memory usage is the same either way.
If anyone still wants to answer this, I'd still be interested, but the method in my first update does not work (each future reads in an empty file and uses that to append to). I'll post that code incase it helps anyone though--docjure version of the aforementioned tutorial:
(def file (agent (File. "blah.xlsx")))
(defn write-out [file workbook]
(with-open [out (FileOutputStream. file)]
(.write workbook out))
file)
(defn write-workbook [file data]
(let [filename (.getPath @file)
workbook (try (load-workbook filename)
(catch Exception e (create-workbook "Sheet1" [])))
sheet (select-sheet "Sheet1" workbook)]
(add-rows! sheet data)
(send file write-out workbook)))
(defn test [file]
(write-workbook file [["n" "m"]])
(dotimes [i 5]
(future (write-workbook file [[i (inc i)]]))))
Thanks