2

I am trying to create a simple self-avoiding freely-jointed chain. The chain part is already completed, it starts at a given point in 3D space (e.g. [0,0,0]). I then generate a new point by randomizing two spherical coordinates θ, φ (the radius is given). Giving me a new point.

Since the points are randomly placed (as opposed to integers a normal 3D walk on a 3D grid), I cannot check if the exact point has been visited before (i.e., iterate over an array of visited points). Instead I have to check if the point lies within a certain radius of previously visited points (small spheres created by every visited point).

I have tried extracting the part of my code I need help with, and adjusted it so it runs by itself. This is with step length 1, and the "avoid radius" = 0.5 (chosen arbitrarily)

import numpy as np
import random

# Generating new point in spherical coordinates, convert to cartesian coordinates
steps = 10
visited = np.zeros([steps,3],dtype=float) # "array of all visited points"

steplength= 1
costheta = random.uniform(-1,1)
theta = np.arccos(costheta)
phi = random.random() * 2 * np.pi

newpos = numpy.copy(visited[-1]) # copy of "previous point"
newpos[0] += steplength* np.cos(phi) * np.sin(theta)
newpos[1] += steplength* np.sin(phi) * np.sin(theta)
newpos[2] += steplength* np.cos(theta)

# Self-avoid condition (part I need help with)
tooclose = False
for i in range(steps): #For every visited position, check if newpos is too close
    diff = np.subtract(pos, self.visited[i])
    dist = np.sqrt(diff.dot(diff))
    if dist < avoidradius ** 2:
        tooclose = True
        break

As shown in the code, I simply check the distance from every point and check if it's smaller than the allowed radius. This approach works to make it "self-avoiding", but it is slow. Is there a better way to do it? Perhaps using a different coordinate system, different numpy code (I am not very good at numpy), a different library or a different setup altogether?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Teddi
  • 21
  • 2
  • 1
    How big is the 3D space you're using? How long would your chain end up being? If the chain isn't very long i would just use numba to accelerate your code as it is now. if the chain is long, but the space is not big, I would save an array representing the distance of each round coordinate from the closest center of a node and use that array when creating the next node in the chain. if the chain is very long and the space is very big, I would split the space to large blocks and use a hierarchical approach to map the "occupancy" of the space using hash tables or smth. – yann ziselman Jan 17 '22 at 09:47
  • Great suggestions, just looked up numba for the first time. Seems like what I am looking for. I'll be using spaces and lengths that are large enough to allow meaningful comparisons to the mathematical models, basically. The approach of dividing into nodes is interesting. Thanks! – Teddi Jan 17 '22 at 09:59
  • The algorithm used to compute the distance is certainly not efficient. You can find the closest points using an octree or a k-D tree. See for example [this](https://stackoverflow.com/a/68622045/12939557) previous post. – Jérôme Richard Jan 17 '22 at 19:01
  • Do you only need to check whether a single freely-jointed chain is self-avoiding, or do you need to create one or more? – DanielTuzes Jan 26 '22 at 14:58

1 Answers1

0

Here is a corrected and slightly extended (initialized) version of your code sample. The vectorized loop is attached at the bottom.

Note: vectorization with numpy always means: look out for any for-loops --> get rid of them.

There are a few things about your problem. Since in the for-loop approach you break whenever you find a solution to the if, the performance depends a lot on WHEN you find the solution. Since in the case of the random walk this will very very likely be one of the very last entries, it is much better to reverse the loop (start at the end). Try yourself.

The vectorized code never breaks, it always does all the computations -- but super-fast.

import numpy as np

# Generating new point in spherical coordinates, convert to cartesian coordinates
steps = 100
visited = np.zeros([steps,3], dtype=float) # "array of all visited points"

steplength = 1.
avoidradius = 1

# fill points
for iPoint in range(steps):
    costheta = np.random.uniform(-1,1)
    theta = np.arccos(costheta)
    phi = np.random.random() * 2 * np.pi
    visited[iPoint,0] = visited[iPoint-1,0] + steplength* np.cos(phi) * np.sin(theta)
    visited[iPoint,1] = visited[iPoint-1,1] + steplength* np.sin(phi) * np.sin(theta)
    visited[iPoint,2] = visited[iPoint-1,2] + steplength* np.cos(theta)
    
# "new" point
costheta = np.random.uniform(-1,1)
theta = np.arccos(costheta)
phi = np.random.random() * 2 * np.pi
newpos = np.array(visited[-1])
newpos[0] += steplength * np.cos(phi) * np.sin(theta)
newpos[1] += steplength * np.sin(phi) * np.sin(theta)
newpos[2] += steplength * np.cos(theta)

# your solution
# %%timeit # use this in jupyter lab

# Self-avoid condition (part I need help with)
tooclose = False
for iPoint in range(steps): # For every visited position, check if newpos is too close
    diff = np.subtract(newpos, visited[iPoint])
    dist = diff.dot(diff)
    if dist < avoidradius**2:
        tooclose = True
        break


# vectorized solution    
# %%timeit # use this in jupyter lab

dist = (visited - newpos)**2
isclose = np.sum(dist, axis=1) < avoidradius**2
tooclose = np.count_nonzero(isclose) > 0

You can paste that into jupyter lab and run some tests, which I did:

For steps=10

Your version:

14.7 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Vectorized version:

8.68 µs ± 212 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

For steps=100

Your version:

143 µs ± 5.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Vectorized version:

11.1 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Thus, you see, for small n it basically doesn't matter. For large n the vectorized loop will hardly become slower while the for-loop becomes super-slow.

Ralf Ulrich
  • 1,575
  • 9
  • 25