2

I have a pipeline of two functions that are both IO-heavy, running on a collection of items concurrently.

The first, func1, is very common, and often I just want the response of func1 alone. Other times, I'd like to process the result of func1 with some other function, func2.

What are the trade-offs (performance/overhead, idiomatic-ness) between composing Task.async_stream, i.e.

enum
|> Task.async_stream(Mod1, :func1, [])
|> Task.async_stream(Mod2, :func2, [])
...

vs. passing a continuation and using one Task.async_stream for both func1 and func2 i.e.

enum
|> Task.async_stream(Mod1, :func1_then, [&Mod2.func2/arity])
...

where func1_then calls the function parameter (Mod2.func2) at the end of the normal func1 computation?

usernolan
  • 139
  • 9

1 Answers1

3

If both functions are IO bound, then there shouldn't be any problem with your first example:

enum
|> Task.async_stream(Mod1, :func1, [])
|> Task.async_stream(Mod2, :func2, [])

If you did want to collapse the two calls, I wouldn't use a continuation style, just pipeline them in a lambda passed to Task.async_stream/3:

enum
|> Task.async_stream(fn x -> x |> Mod1.func1() |> M2.func2() end)

Alternatively, you might consider using Flow:

enum 
|> Flow.from_enumerable()
|> Flow.map(&Mod1.func1/1)
|> Flow.map(&Mod2.func2/1)
|> Flow.run()
Mike Buhot
  • 4,790
  • 20
  • 31
  • Great answer, glad to have more perspectives on the way to accomplish this. Does flow introduce additional overhead over Task.async_stream? The two functions are IO bound (reading/writing files), but the collection itself is typically going to be less than 100 items. – usernolan Apr 25 '17 at 01:38
  • 2
    To follow up on this, I did some **very naive** benchmarking (using example found [here](http://stackoverflow.com/a/29674651/1492117)) and found that in 500 runs of each, the average times were as follows: `Composed Task.async: 0.204734s` `Piped Task.async: 0.21415s` `Flow: 0.25598s` Obviously to be taken with a grain of salt for any other context. – usernolan Apr 25 '17 at 02:56
  • First example is better than second one, because you split functions into two different streams, in second example you have only one stream hence operations aren't that isolated on the very same level. – PatNowak Apr 25 '17 at 06:25