I am attempting to maturely optimize something akin to this mwe code. I'm using a list comprehension but believe I should be able to vectorize it somehow.
A = numpy.arange(20000)
B = numpy.arange(20000, 50000)
C = [bin(i^j).count('1') for i in A for j in B].count(1)
(This is a search for all members in group A
that are hamming distance 1 from a member in group B
.) The sizes are of the correct order of magnitude, but I'll repeat the entire sequence about 100 times. The average size of C
is expected to be around 10k.
I've been unsuccessful in creating a universal function uhamming
for bin(i^j).count('1')
with numpy.frompyfunc
; I'm getting
module 'numpy' has no attribute 'uhamming'
I'd be quite happy to have C
be an array. Thanks for looking!
FYI, here's my profiling output for a minimized version using (2000) and (2000, 5000):
12000007 function calls in 5.442 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.528 2.528 5.342 5.342 <string>:4(<listcomp>)
6000000 1.527 0.000 1.527 0.000 {method 'count' of 'str' objects}
6000000 1.287 0.000 1.287 0.000 {built-in method builtins.bin}
1 0.089 0.089 0.089 0.089 {method 'count' of 'list' objects}
1 0.011 0.011 5.442 5.442 <string>:2(<module>)
1 0.000 0.000 5.442 5.442 {built-in method builtins.exec}
2 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.arange}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}