2

Let's say I have the following code :

(defn multiple-writes []
  (doseq [[x y] (map list [1 2] [3 4])] ;; let's imagine those are paths to files
     (when-not (exists? x y) ;; could be left off, I feel it is faster to check before overwriting
       (write-to-disk! (do-something x y)))))

That I call like this (parameters omitted) :

   (go (multiple-writes))

I use go to execute some code "in the background", but I do not know if I am using the right tool here. Some more information about those functions :

  • this is not high-priority code at all. It could even fail - multiple-writes could be seen as a cache-filling function.
  • I consequently do not care about the return value.
  • do-something takes a between 100 and 500 milliseconds depending of the input
  • do-something consumes some memory (uses image buffers, some images can be 2000px * 2000px)
  • there are 10 to 40 elements/images to be processed every time multiple-writes is called.
  • every call to write-to-disk will create a new file (or overwrite it if any, though that should not happen)
  • write-to-disk writes always in the same directory

So I would like to speed up things by executing (write-to-disk! (do-something x y)) in parallel to go as fast as possible. But I don't want to overload the system at all, since this is not a high-priority task.

How should I go about this ?

Note : despite the title, this is not a duplicate of this question since I don't want to restrict to 3 threads (not saying that the answer can't be the same, but I feel this question differs).

Community
  • 1
  • 1
nha
  • 17,623
  • 13
  • 87
  • 133
  • 1
    Are you sure you want to parallelize `(write-to-disk! (do-something x y))` and not `(do-something x y)`? Parallelizing IO is unlikely to gain anything and can possibly be more costly. – muhuk Feb 18 '16 at 10:58
  • @muhuk I edited (twice) my question, there are now informations regarding the disk writes. I am not sure how parallelizing IO could be more costly, and I worry about using too much memory (worst case : 40 * big-image in memory for every request). I think I see your point though, the calculations are independent of the writes, and should not have to wait for the writes. I am just saying they should not go ahead too much. – nha Feb 18 '16 at 11:17
  • 1
    OK, now it makes more sense. Though, if all parallel units take comparable times to finish, you'd still hold pretty much everything in memory at the same time. Then they'd all attack the disk all at once. – muhuk Feb 18 '16 at 11:21
  • @muhuk true, I need something more sophisticated than a simple parallel function. Something like a "calculations queue" and a "write queue" loosely synchronised perhaps (ie. the calculations could be no more than "X steps ahead", or "Y memory consumed" ahead). There seem to be a general synchronisation pattern behind, although I can't figure out if it exists already or not. – nha Feb 18 '16 at 11:24

2 Answers2

1

Consider basing your design on streams or fork/join.

I would a single component that does IO. Every processing node can then send their results to be saved there. This is easy to model with streams. With fork/join, it can be achieved by not returning the result up in the hierarchy but sending it to eg. an agent.

If memory consumption is an issue, perhaps you can divide work even more. Like 100x100 patches.

muhuk
  • 15,777
  • 9
  • 59
  • 98
  • Interesting. Agree on the single IO component (I have only one disk after all). I am thinking about using a "write agent" with `send-off` for this now. Maybe there could also be a "computation agent" (not sure, never used clojure agents. – nha Feb 18 '16 at 11:36
1

Take a look at the claypoole library, which gives some good and simple abstractions filling the void between pmap and fork/join reducers, which otherwise would need to be coded by hand with futures and promises.

With pmap all results of a parallel batch need to have returned before the next batch is executed, because return order is preserved. This can be a problem with widely varying processing times (be they calculation, http requests, or work items of different "size"). This is what usually slows down pmap to single threaded map + unneeded overhead performance.

With claypoole's unordered pmap and unordered for (upmap and upfor), slower function calls in one thread (core) can be overtaken by faster ones on another thread because ordering doesn't need to be preserved, as long as not all cores are clogged by slow calls.

This might not help much in case of IO to one disk being the only bottleneck, but since claypoole has configurable thread pool sizes and functions to detect the number of available cores, it will help with restricting the amount of cores.

And where fork/join reducers would optimize CPU usage by work stealing, it might greatly increase memory use, since there is no option to restrict the amount of parallel processes without altering the reducer library.

NielsK
  • 6,886
  • 1
  • 24
  • 46
  • If I replace network with IO in their post, it looks exactly what I am looking for (it even has priorities) ! – nha Feb 18 '16 at 13:21