1

I was reading the following presentation:

http://www.idt.mdh.se/kurser/DVA201/slides/parallel-4up.pdf

and the author claims that the map function is built very well for parallelism (specifically he supports his claim on page 3 or slides 9 and 10).

If one were given the problem of increasing each value of a list by +1, I can see how looping through the list imperatively would require a index value to change and hence cause potential race condition problems. But I'm curious how the map function better allows a programmer to successfully code in parallel.

Is it due to the way map is recursively defined? So each function call can be thrown to a different thread?

I hoping someone can provide some specifics, thanks!

Will Ness
  • 70,110
  • 9
  • 98
  • 181
H_1317
  • 147
  • 1
  • 5
  • because each application of the function `f` to an element of the input list is *independent* from any other application to any other element, so they can all be done independently from each other, i.e. in parallel. a hypothetical `par_map` would allocate the storage to back the resulting list, and spark execution of a new thread for each element `e` in the list, providing it the reference to the place which will need to be updated with the result of `f e`. When there are no more active threads, the `map` has finished. Of course you could make each thread work on a block of say 1000 `e`s, too. – Will Ness Nov 21 '17 at 14:56

2 Answers2

3

The map function applies the same pure function to n elements in a collection and aggregates the results. It doesn't matter the order in which you apply the function to the members of the collection because by definition the return value of the function is purely dependent upon the input.

john_omalley
  • 1,398
  • 6
  • 13
  • right, but could one not just split up the list into different chunks and process each chunk on its own thread imperatively with a loop? how would that differ from the above? – H_1317 Nov 18 '17 at 18:17
  • Perhaps nothing functionally... I think it's just a matter of complexity. The algorithm you're describing would be more complex... whereas you could just leverage a library to perform a parallel map. At least that's how I interpret his point. – john_omalley Nov 18 '17 at 18:25
  • I suppose i was more interested in the internals of map. if map is defined recursively then we would have: map(f(x), list) = if list is empty return the empty list else return f(head_of_list) concat with map(f(x),list.tail) .... if map is defined recursively, how does that allow map to avoid sequentially computing f(head_of_list) one by one? (like a loop would) – H_1317 Nov 18 '17 at 18:39
  • @H_1317 The point is that if the API only provides tools for looping imperatively over a list, the programmer will have to implement parallelism himself. While the API of map allows the programmer to just supply the function he wants to apply to every element and then internally map can do whatever it wants with it, and handle the parallelism on behalf of the programmer. – Jasper-M Nov 18 '17 at 20:47
  • @Jasper-M even if the API did not define map, could i not just define my own function called map that internally used a loop? what benefit does this particular recursive map function, if any, have over the loop version ? – H_1317 Nov 18 '17 at 20:55
  • @H_1317 I don't think recursion is a requirement. You're the only one who brought that up... I think you misinterpreted what the author of those slides meant to say. – Jasper-M Nov 18 '17 at 21:06
  • just for completeness, this is only true of eager languages, or otherwise for lazy evaluation when the function `f` does not refer to the list of results as it is being built - a valid *lazy* technique, which does reintroduce directionality into this picture; e.g. `partial_sums(xs) = ys where ys = [0] ++ (map2 (+) xs ys)` which is a fancy way of formulating the iteration of `sum += x for each x in xs`, but syntactically still a map. with concurrency, this translates to various schemes for timing schedules for *composed* map operations, either having to wait for *one* cell, or the whole list. – Will Ness Nov 22 '17 at 09:22
  • cf. [this old answer](https://stackoverflow.com/a/11951590/849891) of mine. – Will Ness Nov 22 '17 at 09:40
3

The others already explained that the standard map implementation isn't parallel.

But in Scala, since you tagged it, you can get the parallel version as simply as

val list = ... // some list
list.par.map(x => ...) // instead of list.map(x => ...)

See also Parallel Collections Overview and documentation for ParIterable and other types in the scala.collection.parallel package.

You can find the implementation of the parallel map in https://github.com/scala/scala/blob/v2.12.1/src/library/scala/collection/parallel/ParIterableLike.scala, if you want (look for def map and class Map). It requires very non-trivial infrastructure and certainly isn't just taking the recursive definition of sequential map and parallelizing it.

If one had defined map via a loop how would that break down?

The slides give F# parallel arrays as the example at the end and at https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs#L266 you can see the non-parallel implementation there is a loop:

let inline map (mapping: 'T -> 'U) (array:'T[]) = 
     checkNonNull "array" array             
     let res : 'U[] = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked array.Length 
     for i = 0 to res.Length-1 do  
         res.[i] <- mapping array.[i] 
     res 
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
  • Thanks Alexey. Would you call the standard recursive definition of map, though still sequential, essential to creating easy to use parallelism? If one had defined map via a loop how would that break down? (I suspect the answers to these questions are maybe larger than a single comment, but any high level overview would be great!) – H_1317 Nov 18 '17 at 20:54
  • 1
    No, it isn't essential or particularly relevant. The slides aren't talking about the recursive implementation at all, only about specification of `map` being _possible_ to implement in parallel. – Alexey Romanov Nov 18 '17 at 22:57