3

I have a 2D data array and I'm trying to get a profile of values about its center in an efficient manner. So the output should be two one-dimensional arrays: one with the values of distances from the center, the other with the mean of all the values in the original 2D that are at that distance from the center.

Each index has a non-integer distance from the center, which prevents me from using some already known solutions for the problem. Allow me to explain.

Consider these matrices

data = np.random.randn(5,5)
L = 2
x = np.arange(-L,L+1,1)*2.5
y = np.arange(-L,L+1,1)*2.5
xx, yy = np.meshgrid(x, y)
r = np.sqrt(xx**2. + yy**2.)

So the matrices are

In [30]: r
Out[30]: 
array([[ 7.07106781,  5.59016994,  5.        ,  5.59016994,  7.07106781],
       [ 5.59016994,  3.53553391,  2.5       ,  3.53553391,  5.59016994],
       [ 5.        ,  2.5       ,  0.        ,  2.5       ,  5.        ],
       [ 5.59016994,  3.53553391,  2.5       ,  3.53553391,  5.59016994],
       [ 7.07106781,  5.59016994,  5.        ,  5.59016994,  7.07106781]])

In [31]: data
Out[31]: 
array([[ 1.27603322,  1.33635284,  1.93093228,  0.76229675, -0.00956535],
       [ 0.69556071, -1.70829753,  1.19615919, -1.32868665,  0.29679494],
       [ 0.13097791, -1.33302719,  1.48226442, -0.76672223, -1.01836614],
       [ 0.51334771, -0.83863115, -0.41541794,  0.34743342,  0.1199237 ],
       [-1.02042539,  0.90739383, -2.4858624 , -0.07417987,  0.90748933]])

For this case the expected output should be array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994, 7.07106781]) for the index of distances, and a second array of same length with the mean of all the values that are at those corresponding distances: array([ 0.98791323, -0.32496927, 0.37221219, -0.6209728 , 0.27986926, 0.04060628]).

From this answer there is a very nice function to compute the profile about any arbitrary point. However, the problem with his approach is that it approximates the distance r by the index distance. So his r for my case would be this:

array([[2, 2, 2, 2, 2],
       [2, 1, 1, 1, 2],
       [2, 1, 0, 1, 2],
       [2, 1, 1, 1, 2],
       [2, 2, 2, 2, 2]])

which is a pretty big difference for me, since I'm working with small matrices. This approximation, however, allows him to use np.bincount, which is pretty handy (but won't work for me).

I've been trying to expand this for float distance, like my version r, but so far no luck. bincount doesn't work with floats and histogram needs equally-spaced bins, which is not the case. Any suggestion?

Community
  • 1
  • 1
TomCho
  • 3,204
  • 6
  • 32
  • 83
  • How about using `((xx**2. + yy**2.)/6.25).astype(int)` as the bins for `bincount`? – Divakar Mar 07 '17 at 22:30
  • Or use `np.digitize` on `r`. – Paul Panzer Mar 07 '17 at 22:45
  • @PaulPanzer I don't see how digitize could be used to do that. Care to provide an example? – TomCho Mar 07 '17 at 23:06
  • It's similar to what @Divakar's library function does. You would provide an ascending sequence of distances defining the rings which you want to be grouped together. Applying these bins to `r` you'd get a matrix very similar to Divakar's Out[280]. -- Unrelated: I was wondering are you going to apply the same distance profile to many images? – Paul Panzer Mar 07 '17 at 23:18
  • @PaulPanzer The distance profile for a given set is fixed, so I'd just pass it to the function with the list of images – TomCho Mar 07 '17 at 23:21

3 Answers3

1

Approach #1

def radial_profile_app1(data, r):
    mid = data.shape[0]//2
    ids = np.rint((r**2)/r[mid-1,mid]**2).astype(int).ravel()
    count = np.bincount(ids)

    R = data.shape[0]//2 # Radial profile radius
    R0 = R+1
    dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])

    mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
    return dists, mean_data

For the given sample data -

In [475]: radial_profile_app1(data, r)
Out[475]: 
(array([ 0.        ,  2.5       ,  3.53553391,  5.        ,  5.59016994,
         7.07106781]),
 array([ 1.48226442  , -0.3297520425, -0.8820454775, -0.3605795875,
         0.5696863263,  0.2883829525]))

Approach #2

def radial_profile_app2(data, r):
    R = data.shape[0]//2 # Radial profile radius
    range_arr = np.arange(-R,R+1)
    ids = (range_arr[:,None]**2 + range_arr**2).ravel()
    count = np.bincount(ids)

    R0 = R+1
    dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])

    mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
    return dists, mean_data

Runtime test -

In [562]: # Setup inputs
     ...: N = 2001
     ...: data = np.random.randn(N,N)
     ...: L = (N-1)//2
     ...: x = np.arange(-L,L+1,1)*2.5
     ...: y = np.arange(-L,L+1,1)*2.5
     ...: xx, yy = np.meshgrid(x, y)
     ...: r = np.sqrt(xx**2. + yy**2.)
     ...: 

In [563]: out01, out02 = radial_profile_app1(data, r)
     ...: out11, out12 = radial_profile_app2(data, r)
     ...: 
     ...: print np.allclose(out01, out11)
     ...: print np.allclose(out02, out12)
     ...: 
True
True

In [566]: %timeit radial_profile_app1(data, r)
     ...: %timeit radial_profile_app2(data, r)
     ...: 
10 loops, best of 3: 114 ms per loop
10 loops, best of 3: 91.2 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Seems interesting, but I fail to see an easy way to put everything as a function of the original `r` function. – TomCho Mar 07 '17 at 23:05
  • @TomCho Then `Approach #3` might suit your needs. – Divakar Mar 07 '17 at 23:15
  • I'm trying to understand what you did now. I think you might have misunderstood what I'm looking for. The output that I'm looking for, for my case, has an the of `array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994, 7.07106781])`, and the values are the `mean`s of every value on the `data` matrix that has those indices. If that's not what you understood please let me know, so that I can update my answer, – TomCho Mar 07 '17 at 23:33
  • @TomCho Yes, I totally missed on seeing any mention of the term `mean` or `average` anywhere in the question or comments. So, please feel free to update the question on the same :) – Divakar Mar 07 '17 at 23:35
  • I just posted an edit that hopefully made it more understandable. But your answer gave me an idea that might solve my problem, although it's not really very optimized. I'll post it in a second if it works – TomCho Mar 07 '17 at 23:41
  • @TomCho Also, would be helpful if you could add the expected output(s) in the question. – Divakar Mar 07 '17 at 23:42
  • I added that to the question. You gave me an insight into my solution, which I posted, but I'm not happy with. So feel free to try a solution again, I'm guessing you can come up with something faster than what I have – TomCho Mar 07 '17 at 23:57
  • @TomCho Well I remodelled those. Added timings for mine. So, check those out! – Divakar Mar 08 '17 at 00:40
1

Got what I was expecting with this function:

def radial_prof(data, r):
    uniq = np.unique(r)
    prof = np.array([ np.mean(data[ r==un ]) for un in uniq ])
    return uniq, prof

But I'm still not happy with the fact that I had to use list comprehension (or a python loop), since it might be slow for very large matrices.

TomCho
  • 3,204
  • 6
  • 32
  • 83
0

Here is an indirect sorting approach that should scale well if batch size and / or number of bins are large. The sorting is O(n log n) all the histogramming is O(n). I've also added a little unscientific speed test. For the speed test I use flat indexing but I left the 2d index code in because its more flexible when dealing with images of different sizes etc.

import numpy as np

# this need only be run once per batch
def r_to_ind(r, dist_bins="auto"):
    f = np.argsort(r.ravel())
    if dist_bins == "auto":
        rs = r.ravel()[f]
        bins = np.where(np.r_[True, rs[1:]!=rs[:-1]])[0]
        dist_bins = rs[bins]
    else:
        bins = np.searchsorted(r.ravel()[f], dist_bins)
    denom = np.diff(np.r_[bins, r.size])
    return f, np.unravel_index(f, r.shape), bins, denom, dist_bins

# this is with adjustable offset
def profile_xy(image, yx, ij, bins, nynx, denom):
    (y, x), (i, j), (ny, nx) = yx, ij, nynx
    return np.add.reduceat(image[i + y - ny//2, j + x - nx//2], bins) / denom

# this is fixed
def profile_xy_no_offset(image, ij, bins, denom):
    return np.add.reduceat(image[ij], bins) / denom

# this is fixed and flat
def profile_xy_no_offset_flat(image, k, bins, denom):
    return np.add.reduceat(image.ravel()[k], bins) / denom

data = np.array([[ 1.27603322,  1.33635284,  1.93093228,  0.76229675, -0.00956535],
       [ 0.69556071, -1.70829753,  1.19615919, -1.32868665,  0.29679494],
       [ 0.13097791, -1.33302719,  1.48226442, -0.76672223, -1.01836614],
       [ 0.51334771, -0.83863115, -0.41541794,  0.34743342,  0.1199237 ],
       [-1.02042539,  0.90739383, -2.4858624 , -0.07417987,  0.90748933]])

r = np.array([[ 7.07106781,  5.59016994,  5.        ,  5.59016994,  7.07106781],
       [ 5.59016994,  3.53553391,  2.5       ,  3.53553391,  5.59016994],
       [ 5.        ,  2.5       ,  0.        ,  2.5       ,  5.        ],
       [ 5.59016994,  3.53553391,  2.5       ,  3.53553391,  5.59016994],
       [ 7.07106781,  5.59016994,  5.        ,  5.59016994,  7.07106781]])

f, (i, j), bins, denom, dist_bins = r_to_ind(r)

result = profile_xy(data, (2, 2), (i, j), bins, (5, 5), denom)
print(dist_bins)
# [ 0.          2.5         3.53553391  5.          5.59016994  7.07106781]
print(result)
# [ 1.48226442 -0.32975204 -0.88204548 -0.36057959  0.56968633  0.28838295]

#########################

from timeit import timeit

n = 2001
batch = 100
fake = 10

a = np.random.random((fake, n, n))
l = np.linspace(-1, 1, n)**2
r = sum(np.ix_(l, l))

def run_all():
    f, ij, bins, denom, dist_bins = r_to_ind(r)
    for b in range(batch):
        profile_xy_no_offset_flat(a[b%fake], f, bins, denom)

print(timeit(run_all, number=10))
# 47.4157 (for 10 batches of 100 images of size 2001x2001)
# and my computer is slower than Divakar's ;-)

I've made some more benchmarks comparing mine to @Divakar's approach 3 stripping out everything precomputable into a run-once-per-batch function. The general finding: they are similar mine has a higher upfront cost but is then faster. But they only cross over at around 100 pictures per batch.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99