1

Below I give an example in 2D, but my real question would be regarding 3D (with or without periodic boundaries). Find all unique neighbors for each segment id using 8 neighbors (2d) or 26 neighbors (3d).

Given the following array:

INPUT

matrix=[
    [1, 2, 2, 3],
    [1, 4, 5, 3],
    [4, 4, 5, 3]
]

OUTPUT non-periodic

1 : [1, 2, 4]
2 : [1, 2, 3, 4, 5]
3 : [2, 3, 5]
4 : [1, 2, 4, 5]
5 : [2, 3, 4, 5] 

OUTPUT periodic

1 : [1, 2, 3, 4]
2 : [1, 2, 3, 4, 5]
3 : [1, 2, 3, 4, 5]
4 : [1, 2, 3, 4, 5]
5 : [2, 3, 4, 5] 

I got a stack of for loops to do the job, but I would really like to use a more numpy/scipy based approach if that could work. I feel like some form of clever convolve could do the trick, I am just not seeing how.

An example of such slow code is presented here.

2 Answers2

0

I think I have found a pretty efficient solution.

def get_all_pairs(label_array):
    """
    Returns all contact pairs in periodic 2d.

    Parameters
    ----------
    array : numpy.ndarray(int64)
        The input label array.

    Returns
    -------
    numpy.ndarray(int32)
        All contact pairs (Nx2) in all dimensions.
    
    """
    all_pairs = []
    # Perform the 4 unique shifts in 2d.
    for i in range(4):
        mask = np.zeros((3,3))
        if i == 0:
            mask[0,2] = 1
        elif i == 1:
            mask[1,2] = 1
        elif i == 2:
            mask[2,2] = 1
        else:
            mask[2,1] = 1
        # Calculate the shifted array using PBC.
        shifted = ndimage.convolve(label_array, mask, output=None, mode='wrap', cval=0.0, origin=0)
        # Find the indices where the label and shifted array differ.
        index_mask = shifted != label_array
        # Find both sides of the pair
        queries = label_array[index_mask]
        targets = shifted[index_mask]
        # Obtain the pairs by merging the queries and targets
        pairs = np.vstack([queries, targets]).T
        # Filter for unique pairs
        pairs = np.unique(pairs, axis=0)
        # Add the pairs in the current shift to the all pairs list.
        all_pairs.append(pairs)
    # Convert the list into an array
    all_pairs = np.vstack(all_pairs)
    # Sort the pairs
    all_pairs = np.array([sorted(pair) for pair in all_pairs])
    # Filter for unique pairs
    all_pairs = np.unique(all_pairs, axis=0)
    return all_pairs

I perform the unique before and after the sorting, this because the sorting is pretty slow and it is best performed on the least amount of pairs. This method also generalizes rather straight forward to 3d.

0

I got about an 8x speedup from your solution by using scipy's ndimage.value_indices. (https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.value_indices.html). How well this scales I don't know but I suspect that running 4 2/3-d convolutions is way more expensive than just incrementing one set of index arrays. (The time tests at the bottom suggest that get_all_pairs_update scale linearly while get_all_pairs scales as n^2).

You'll have to tweak for boundary conditions, but this should get you started.

from collections import defaultdict

import numpy as np
from scipy import ndimage
from scipy.sparse import csr_matrix
from timeit import timeit

def get_all_pairs_update(label_array):
    """
    Returns all contact pairs in periodic 2d.

    Parameters
    ----------
    array : numpy.ndarray(int64)
        The input label array.

    Returns
    -------
    dict: 
        set of all neighbors to a given label
    
    """
    # get the unique values and their locations
    neighborhoods = ndimage.value_indices(label_array)

    borders = dict()
    for neighborhood, indices in neighborhoods.items():
        neighbors = list()
        for dim, sz in enumerate(label_array.shape): 
            # cast indices to a list to shift them, recast them to tuple because that's what numpy arrays 
            these_indices = list(indices)
            these_indices[dim] = indices[dim] - 1
            neighbors += label_array[tuple(these_indices)].tolist()
            these_indices[dim] = (indices[dim] + 1) % sz
            neighbors += label_array[tuple(these_indices)].tolist()
        # use python sets to get the unique values. get rid of the neighbor to self term 
        borders[neighborhood] = set(neighbors).difference(set((neighborhood,)))
    return borders
    

matrix=np.array([
    [1, 2, 2, 3],
    [1, 4, 5, 3],
    [4, 4, 5, 3]
])

print(get_all_pairs_update(matrix))
# {1: {2, 3, 4}, 2: {1, 3, 4, 5}, 3: {1, 2, 4, 5}, 4: {1, 2, 3, 5}, 5: {2, 3, 4}}

print(timeit('get_all_pairs(matrix)', globals=globals(), number=100))
# 0.03094233898445964
print(timeit('get_all_pairs_update(matrix)', globals=globals(), number=100))
# 0.004022964974865317


matrix=np.random.randint(7, size=(100,100))
print(timeit('get_all_pairs(matrix)', globals=globals(), number=100))
# 1.1153271459625103
print(timeit('get_all_pairs_update(matrix)', globals=globals(), number=100))
# 0.07583873602561653
Elliot
  • 2,541
  • 17
  • 27