vectorize list comprehension with non universal function

Question

I am attempting to maturely optimize something akin to this mwe code. I'm using a list comprehension but believe I should be able to vectorize it somehow.

A = numpy.arange(20000)
B = numpy.arange(20000, 50000)
C = [bin(i^j).count('1') for i in A for j in B].count(1)

(This is a search for all members in group A that are hamming distance 1 from a member in group B.) The sizes are of the correct order of magnitude, but I'll repeat the entire sequence about 100 times. The average size of C is expected to be around 10k.

I've been unsuccessful in creating a universal function uhamming for bin(i^j).count('1') with numpy.frompyfunc; I'm getting

module 'numpy' has no attribute 'uhamming'

I'd be quite happy to have C be an array. Thanks for looking!

FYI, here's my profiling output for a minimized version using (2000) and (2000, 5000):

     12000007 function calls in 5.442 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    2.528    2.528    5.342    5.342 <string>:4(<listcomp>)
6000000    1.527    0.000    1.527    0.000 {method 'count' of 'str' objects}
6000000    1.287    0.000    1.287    0.000 {built-in method builtins.bin}
    1    0.089    0.089    0.089    0.089 {method 'count' of 'list' objects}
    1    0.011    0.011    5.442    5.442 <string>:2(<module>)
    1    0.000    0.000    5.442    5.442 {built-in method builtins.exec}
    2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.arange}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

If an element of A is a hamming distance of 1 from multiple elements of B, should that count once, or multiple times? — user2357112, Oct 20 '17 at 21:02

hpaulj · Accepted Answer · 2017-10-21T04:23:05.550

The error suggests that someplace you are using

numpy.unhamming

A quick and dirty use of frompyfunc would be:

In [126]: def unhamming(i,j):
     ...:     return bin(i^j).count('1')
     ...: 
In [127]: f = np.frompyfunc(unhamming, 2,1)

The function takes 2 inputs, and returns 1.

With smaller arrays:

In [124]: A = np.arange(200)
In [125]: B = np.arange(200,500)
In [128]: C = [bin(i^j).count('1') for i in A for j in B].count(1)
In [131]: C
Out[131]: 336

Using the 'vectorized' function:

In [129]: f(A,B[:,None])
Out[129]: 
array([[3, 4, 4, ..., 3, 3, 4],
       [4, 3, 5, ..., 2, 4, 3],
       [4, 5, 3, ..., 4, 2, 3],
       ..., 
       [6, 5, 7, ..., 4, 6, 5],
       [6, 7, 5, ..., 6, 4, 5],
       [7, 6, 6, ..., 5, 5, 4]], dtype=object)

And to find out how many 1's there are either turn it into a list, or use a numpy sum.

In [130]: _.ravel().tolist().count(1)
Out[130]: 336
In [132]: (f(A,B[:,None])==1).sum()
Out[132]: 336

Speed is basically the same

In [133]: timeit C = [bin(i^j).count('1') for i in A for j in B].count(1)
45 ms ± 380 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [134]: timeit (f(A,B[:,None])==1).sum()
46.1 ms ± 97.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sometimes frompyfunc gives up to a 2x speed improvement over direct iteration. But it still has to call your unhamming function many times. It is not true 'vectorization' (in the sense of moving calculations to C level code).

I suspect there's a way of doing the same calculation with numpy expressions, broadcasting A against B[:,None]. But I'll leave that to another time or poster.

C = A ^ B[:,None]

does part of your function. But I haven't found a version of bin that works on arrays (np.binary_repr doesn't help).

In [160]: f1 = np.frompyfunc(lambda x: bin(x).count('1'),1,1)
In [161]: f1(A^B[:,None])
Out[161]: 
array([[3, 4, 4, ..., 3, 3, 4],
       [4, 3, 5, ..., 2, 4, 3],
       [4, 5, 3, ..., 4, 2, 3],
       ..., 
       [6, 5, 7, ..., 4, 6, 5],
       [6, 7, 5, ..., 6, 4, 5],
       [7, 6, 6, ..., 5, 5, 4]], dtype=object)
In [162]: (f1(A^B[:,None])==1).sum()
Out[162]: 336
In [163]: timeit (f1(A^B[:,None])==1).sum()
37 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Just a small improvement.

Searching on [numpy] and hamming produced this question: Fastest way to get hamming distance for integer array

Here's an adaptation of one of @Divaker's answers:

def foo(A, B):
    x = np.bitwise_xor(A,B[:,None])
    r = (1 << np.arange(15))[:,None]
    xr = (r[:,None]&x)
    xrc = (xr>0).sum(axis=0)
    return (xrc==1).sum()
In [280]: foo(A,B)
Out[280]: 336

It could be tweaked, for example adjust the size of r, changing the broadcasting and reshapes. But the final sum matches.

Thanks @hpaulj, I figured the function calls would add some overhead and I'll play with this for larger numbers. And yes, I was using `numpy.uhamming.outer(A, B)`. What does `frompyfunc` do if not turn my function into a universal I could then use for broadcasting? — zazizoma, Oct 20 '17 at 23:07
It doesn't turn it into a `ufunc` (with methods like `outer` and `reduce`). But it does broadcast the inputs arrays, as shown in `In[32]`, same as `A*B[:,None]` or `np.add(A, B[:,None])`. — hpaulj, Oct 20 '17 at 23:13
Thanks again @hpaulj. I’m still surprised the list comps aren’t much less efficient. Concurrent_futures? — zazizoma, Oct 20 '17 at 23:24
https://stackoverflow.com/questions/40875282/fastest-way-to-get-hamming-distance-for-integer-array might help. — hpaulj, Oct 21 '17 at 02:13
This is all great stuff. List comprehension still wins out at larger values, ie: `A = numpy.arange(5000) B = numpy.arange(5000,15000)` This suggests I may need to code this algorithm in C. It's ultimately 20 * (2 ** 20) hamming distance calculations, and I can't get this to complete on my machine without a kernel crash. — zazizoma, Oct 22 '17 at 23:10

vectorize list comprehension with non universal function

1 Answers1