3

I would like to use pmap to parallelize a function over iterators running on multiple processors in the same machine with shared memory within a Julia cluster, and wanted to get a few details.

  1. Is it more efficient to use SharedArray or DArray for the above scenario? When is DArray more efficient when using a single machine with shared memory?

  2. @everywhere applied to functions and variables including arrays, dictionaries and dataframes makes it readable by all processors. Is that by making copies of the objects or making them viewable?

  3. These “global” objects are available at all levels of functions. The fact that global objects slow computation, should these be passed as arguments to functions as they are mutable and hence no copying is conducted?

  4. To be able to both read and write, one needs to use SharedArray for either floating point or integer types. Is SharedArray shared without copying?

  5. Does @views, transpose, and copy work in the same way as normal for when @everywhere is used or when SharedArrays are used, in that actual copying is not conducted?

  6. Does sdata(S::SharedArray) copy or just makes the SharedArray accessible for reading and writing?

  7. What changes to the code need to be made to run it on a single machine with multiple CPUs and GPUs all having a single shared memory (like M1 processor)?

Thank you,

james
  • 31
  • 1
  • These are all good questions to ask in a more active forum such as Zulip. Each bullet point is worth a full discussion. – juliohm Jul 05 '21 at 21:50
  • Though it could be good to have some general answers here if we can! I probably don't have enough experience with these to do it myself – cbk Jul 06 '21 at 18:10

1 Answers1

1
  1. SharedArray is just memory shared between processes - there is no communication nor memory costs. Distributed arrays use inteprocesses communication. However, you might have scenarios where each process works almost exclusively with each own part of a DArray.

  2. Each process in a Julia cluster has its own memory. Hence @everywhere normally does copying. This also means that when loading uncompiled/unbuilt modules you might end up having some compilation races.

  3. Yes they should be passed as arguments. Please see Julia performance tips: https://docs.julialang.org/en/v1/manual/performance-tips/

  4. see point (1) - no copying occurs in a SharedArray

  5. Here you need a separe SO question as it seems not clear to me

  6. sdata returns the actual Array object backing the SharedArray so no copying

  7. Here you need a separe SO question with a specific simple use case to discuss

Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62