An N-body simulation is used to simulated dynamics of a physical system involving particles interactions, or a problem reduced to some kind of particles with physical meaning. A particle could be a gas molecule or a star in a galaxy. Dask.bag provides a simple way to distribute the particles in a cluster, for example, giving dask.bag.from_sequence()
a custom iterator, that returns a particle object:
class ParticleGenerator():
def __init__(self, num_of_particles, max_position, seed=time.time()):
random.seed(seed)
self.index = -1
self.limit = num_of_particles
self.max_position = max_position
def __iter__(self):
return self
def __next__(self):
self.index += 1
if self.index < self.limit :
return np.array([self.max_position*random.random(), self.max_position*random.random(), self.max_position*random.random()])
else :
raise StopIteration
b = db.from_sequence( ParticleGenerator(1000, 1, seed=123456789) )
Here, the particle object is simply a numpy array, but could be anything. Now, to compute the interactions between all particles, information about position, speed and similar quantities must be shared. dask.bag.map
maps a function across all elements in collection, inside this function, interaction between the element and all other particles is calculated to obtain the new particle state.
b = b.map(update_position, others=list(b))
b.compute()
For completitude, this is update_position
function:
def update_position(e, others=None, mass=1, dt=1e-4):
f = np.zeros(3)
for o in others:
r = e - o
r_mag = np.sqrt(r.dot(r))
if r_mag == 0 :
continue
f += ( A/(r_mag**7) + B/(r_mag**13) ) * r
return e + f * (dt**2 / mass)
A
and B
some arbitrary values. dask.bag.map()
could be called multiple times inside a loop to execute the simulation.
- Is
Dask.bag
a good collection (abstraction) for dealing with this kind of problems? Maybe Dask.distributed is a better idea? - Programming the simulation this way, is the scheduler handling all communications or information about position, speed, etc is shared with inter-worker communication?
- Any comments to optimize the code? Specially about the overheat of transforming the collection into a list while calling
dask.bag.map()
.