So A is a list of list only containing 0's and 1's . What is the most pythonic (and also fairly fast) way of calculating A * A' without using nympy or scipy.
The numpy equivalent of above would be:
def foo(a):
return a * a.T
So A is a list of list only containing 0's and 1's . What is the most pythonic (and also fairly fast) way of calculating A * A' without using nympy or scipy.
The numpy equivalent of above would be:
def foo(a):
return a * a.T
Being that your data is zeroes and ones, probably the best non-numpy solution is to use bitarrays:
def dot_self(matrix):
""" Multiply a 0-1 matrix by its transpose.
Use bitarrays to possibly speed up calculations.
"""
from bitarray import bitarray
rows = tuple(bitarray(row) for row in matrix)
return [[(r & c).count() for c in rows] for r in rows]
If bitarray
can't be installed, the 1d solution in Dot Product in Python without NumPy can be used with the same nested comprehension (https://stackoverflow.com/a/35241087/901925). This does not take advantage of the 0/1 nature of the data. So basically it's an exercise in nested iterations.
def dot1d(a,b):
return sum(x*y for x,y in zip(a,b))
def dot_2cmp(a):
return [[dot1d(r,c) for c in a] for r in a]
itertools.product
can be used to iterate over the row and column combinations, but the result is a 1d list, which then needs to be grouped (but this step is fast):
def dot2d(a):
aa=[dot1d(x,y) for x,y in itertools.product(a,a)]
return [aa[i::len(a)] for i in range(len(a))]
testing:
a=[[1,0,1,0],[0,1,0,1],[0,0,1,1],[1,1,0,0]]
In [246]: dot2d(a)
Out[246]: [[2, 0, 1, 1], [0, 2, 1, 1], [1, 1, 2, 0], [1, 1, 0, 2]]
In [247]: dot_2cmp(a)
Out[247]: [[2, 0, 1, 1], [0, 2, 1, 1], [1, 1, 2, 0], [1, 1, 0, 2]]
In [248]: np.dot(np.array(a),np.array(a).T).tolist()
Out[248]: [[2, 0, 1, 1], [0, 2, 1, 1], [1, 1, 2, 0], [1, 1, 0, 2]]
In timings on a larger list, the 2 list operations take the same time. The array version, even with the in/out array conversion is considerably faster.
In [254]: b=np.random.randint(0,2,(100,100)).tolist()
In [255]: timeit np.dot(np.array(b),np.array(b).T).tolist()
100 loops, best of 3: 5.46 ms per loop
In [256]: timeit dot2d(b)
10 loops, best of 3: 177 ms per loop
In [257]: timeit dot_2cmp(b)
10 loops, best of 3: 177 ms per loop
The result is symmetric, so it might be worth the effort to skip the duplicate calculations. Mapping them back on to the nested list will be more work than in numpy
.
In [265]: timeit [[dot1d(r,c) for c in b[i:]] for i,r in enumerate(b)]
10 loops, best of 3: 90.1 ms per loop
For what it's worth I don't consider any of these solutions 'more Pythonic' than the others. As long as it is written in clear, running, Python it is Pythonic.