Why random order

Question

I've following module, that simulate parallel map.

defmodule Parallel do

  def pmap(collection, fun) do
    me = self

    collection
        |> Enum.map(fn (elem) ->
                spawn_link fn -> send(me, { self, fun.(elem) }) end
             end)
        |> Enum.map(fn (pid) ->
                receive do { ^pid, result } ->
                    result
                end
        end)
  end
end

I compile, run and got result as expected:

iex(5)> Parallel.pmap 1..1000, &(&1 * &1)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256 ...]

When I remove the pin print operator from receive do { pid, result } ->, then I've got no more the list in the right order:

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 256, 225, 289, 361...]

Why does the pin operator influence the order?

score 4 · Accepted Answer · edited May 23 '17 at 10:28

You are starting one thousand concurrent processes. Each of them when it finishes sends a message consisting of its PID and result. Scheduler is non deterministic, so it might happen that the messages are received in random order.

The pin operator means "don't assign variable, but pattern match".

Lets consider an example, when you have three messages coming in reverse order:

{pid3, 9}
{pid2, 4}
{pid1, 1}

With {^pid, result} you are matching concrete PID, so when the first message arrives, pattern match fails and the message is stored in the mailbox.

When the second one comes the same happens.

When the third message comes, it matches, you get the result and you move on to matching on next pid2, which is already in the mailbox. At the end you are matching on the pid3 and also get it straight from mailbox.

With {pid, result} you are reassigning the pid variable. When the first message comes it will match and the pid will be assigned the value of pid3.

At the end, you will have a list of messages in the order they arrived.

Also, see my other answer about pin operator: https://stackoverflow.com/a/27975233/912225

whatyouhide · Answer 2 · 2016-03-25T09:57:45.263

When you map over the collection of elements and spawn a new process for each element in the collection, you'll get back a list of pids: these pids will be in the same order as the collection you're mapping over, i.e., the pid for a given elem is in the same position in the list of pids as the elem in the original collection. This is just how mapping works, you apply an operation to each element of a list and get back a list of the results of these operations.

Now, you map over the list of pids. When you match on ^pid, the code will block until a message from the current pid you're mapping arrives to the current process. However, the {^pid, result} message may not be the only one or the first one in the message queue of the current process: since all the spawned processes are now running in parallel, they won't be sending the results in the order they've been spawned. This means that when you receive and match on {^pid, result}, the message queue could have other messages ({pid_1, result_1}, {pid_2, result_2}) before the one that will match {^pid, result}. Thanks to how receiving processes works in Erlang, these messages will just be skipped if they don't match the pattern in receive, until one of them matches (or we keep waiting for new ones that match).

When you match on {pid, result}, you're saying that any two-element tuple is fine: in this case, pid will likely not be the pid you're currently mapping exactly for the reasons I talked about above (spawned processes will send results back in a not predictable order).

A more visual representation: say you have this message queue in the current process, after the spawned processes have start running (we'll call the spawned processes pid1, pid2, and so on):

# The one on top is the first in the message queue:
{pid3, res3}
{pid1, res1}
{pid4, res4}
{pid5, res5}
{pid2, res2}

and let's say you're currently mapping pid1 (i.e., pid in the function you pass to Enum.map/2 is pid1).

When you do receive do {^pid, res} -> ... (with pid == pid1), the first message will not match; hence, the next message will be matched and this one will match. {pid3, res3} will be put back in the message queue and the receive will be executed with {pid1, res1}. The message queue looks like this:

# The one on top is the first in the message queue:
{pid3, res3}
{pid4, res4}
{pid5, res5}
{pid2, res2}

Back to the original queue: now, if you match on {pid, res} (without the ^ pin operator), any two-element tuple will match; specifically, {pid3, res3} will match and the receive block will be executed with that (even if the pid we're mapping is pid1). This means that in the resulting list, the result for the 3rd element (for which pid3 was spawned) is in the place of the result for the 1st element: random order!

Why random order

2 Answers2