2

An N-body simulation is used to simulated dynamics of a physical system involving particles interactions, or a problem reduced to some kind of particles with physical meaning. A particle could be a gas molecule or a star in a galaxy. Dask.bag provides a simple way to distribute the particles in a cluster, for example, giving dask.bag.from_sequence() a custom iterator, that returns a particle object:

class ParticleGenerator():
    def __init__(self, num_of_particles, max_position, seed=time.time()):
        random.seed(seed)
        self.index = -1
        self.limit = num_of_particles
        self.max_position = max_position
    def __iter__(self):
        return self 
    def __next__(self):
         self.index += 1
         if self.index < self.limit :
             return np.array([self.max_position*random.random(), self.max_position*random.random(), self.max_position*random.random()]) 
         else :
             raise StopIteration
b = db.from_sequence( ParticleGenerator(1000, 1, seed=123456789) )

Here, the particle object is simply a numpy array, but could be anything. Now, to compute the interactions between all particles, information about position, speed and similar quantities must be shared. dask.bag.map maps a function across all elements in collection, inside this function, interaction between the element and all other particles is calculated to obtain the new particle state.

b = b.map(update_position, others=list(b))
b.compute()

For completitude, this is update_position function:

def update_position(e, others=None, mass=1, dt=1e-4):
    f = np.zeros(3)
    for o in others:
        r = e - o
        r_mag = np.sqrt(r.dot(r))
        if r_mag == 0 :
            continue 
        f += ( A/(r_mag**7) + B/(r_mag**13) ) * r
    return e + f * (dt**2 / mass)

A and B some arbitrary values. dask.bag.map() could be called multiple times inside a loop to execute the simulation.

  1. Is Dask.bag a good collection (abstraction) for dealing with this kind of problems? Maybe Dask.distributed is a better idea?
  2. Programming the simulation this way, is the scheduler handling all communications or information about position, speed, etc is shared with inter-worker communication?
  3. Any comments to optimize the code? Specially about the overheat of transforming the collection into a list while calling dask.bag.map().

1 Answers1

2

Generally speaking N-Body simulations require sophisticated algorithms and data structures to run efficiently. Many common solutions include the use of complex tree data structures. You might want to search for terms like kd-tree or barnes-hut.

Dask.bag on the other hand is one of the simplest/dumbest parallel programming abstractions you can imagine, similar to other bulk data processing systems like MapReduce and Spark. These systems are not flexible enough to give good performance on complex problems like N-Body simulations.

Something like dask.array or dask.delayed will offer more flexibility, but even these won't be the same as a finely tuned KD-Tree.

MRocklin
  • 55,641
  • 23
  • 163
  • 235