-2

So A is a list of list only containing 0's and 1's . What is the most pythonic (and also fairly fast) way of calculating A * A' without using nympy or scipy.

The numpy equivalent of above would be:

def foo(a):
    return a * a.T
Soumyajit
  • 435
  • 1
  • 9
  • 19
  • 6
    Just write the algorithm. If you need performance, use numpy. Your question doesn't make much sense other than as a though experiment (then you could write down your own thoughts) or as homework. – Ulrich Eckhardt Feb 06 '16 at 11:15
  • Yes I do need speed, but the problem is most of the OJ's out there do not support numpy. I was hoping for a solution along the lines of zip() and itertools :\ – Soumyajit Feb 06 '16 at 11:38
  • 1
    I just saw one here for 1d lists, using `sum` and `zip` - That's simple since it results just one number. – hpaulj Feb 06 '16 at 12:05
  • http://stackoverflow.com/q/35208160/901925 – hpaulj Feb 06 '16 at 12:10
  • @UlrichEckhardt it makes sense from the OP's point of view, to ask whether there is any algorithm that takes advantage from the fact that the data is composed only by zeroes and ones. – rewritten Feb 06 '16 at 12:38
  • Please show what you've tried so far. – tom10 Feb 06 '16 at 15:04
  • 1
    What is OJ? Why can't you install numpy where you need it, if performance matters? – Jeff Hammond Feb 06 '16 at 16:01
  • @tom10, I tried only with numpy to find out I cannot use it. I did not write the for-loop matrix multiplication myself and was thus looking for some existing code here. For transposing I am using at = zip(*a) – Soumyajit Feb 09 '16 at 13:59

2 Answers2

2

Being that your data is zeroes and ones, probably the best non-numpy solution is to use bitarrays:

def dot_self(matrix):
    """ Multiply a 0-1 matrix by its transpose.
    Use bitarrays to possibly speed up calculations.
    """
    from bitarray import bitarray
    rows = tuple(bitarray(row) for row in matrix)
    return [[(r & c).count() for c in rows] for r in rows]
rewritten
  • 16,280
  • 2
  • 47
  • 50
  • 1
    If he can't install `numpy`, can he install `bitarray`? That's a third party compiling package too. – hpaulj Feb 06 '16 at 17:33
  • You are right, it's also a third party package. Maybe he can install it, in that case it would be a fine solution. The OP did specify numpy and scify only. – rewritten Feb 06 '16 at 19:55
  • I can't install anything. Thanks for this answer anyway, I didn't know about bitarrays :) – Soumyajit Feb 09 '16 at 14:02
1

If bitarray can't be installed, the 1d solution in Dot Product in Python without NumPy can be used with the same nested comprehension (https://stackoverflow.com/a/35241087/901925). This does not take advantage of the 0/1 nature of the data. So basically it's an exercise in nested iterations.

def dot1d(a,b):
    return sum(x*y for x,y in zip(a,b))

def dot_2cmp(a):
    return [[dot1d(r,c) for c in a] for r in a]

itertools.product can be used to iterate over the row and column combinations, but the result is a 1d list, which then needs to be grouped (but this step is fast):

def dot2d(a):
    aa=[dot1d(x,y) for x,y in itertools.product(a,a)]
    return [aa[i::len(a)] for i in range(len(a))]

testing:

a=[[1,0,1,0],[0,1,0,1],[0,0,1,1],[1,1,0,0]]

In [246]: dot2d(a)
Out[246]: [[2, 0, 1, 1], [0, 2, 1, 1], [1, 1, 2, 0], [1, 1, 0, 2]]
In [247]: dot_2cmp(a)
Out[247]: [[2, 0, 1, 1], [0, 2, 1, 1], [1, 1, 2, 0], [1, 1, 0, 2]]
In [248]: np.dot(np.array(a),np.array(a).T).tolist()
Out[248]: [[2, 0, 1, 1], [0, 2, 1, 1], [1, 1, 2, 0], [1, 1, 0, 2]]

In timings on a larger list, the 2 list operations take the same time. The array version, even with the in/out array conversion is considerably faster.

In [254]: b=np.random.randint(0,2,(100,100)).tolist()
In [255]: timeit np.dot(np.array(b),np.array(b).T).tolist()
100 loops, best of 3: 5.46 ms per loop
In [256]: timeit dot2d(b)
10 loops, best of 3: 177 ms per loop
In [257]: timeit dot_2cmp(b)
10 loops, best of 3: 177 ms per loop

The result is symmetric, so it might be worth the effort to skip the duplicate calculations. Mapping them back on to the nested list will be more work than in numpy.

In [265]: timeit [[dot1d(r,c) for c in b[i:]] for i,r in enumerate(b)]
10 loops, best of 3: 90.1 ms per loop

For what it's worth I don't consider any of these solutions 'more Pythonic' than the others. As long as it is written in clear, running, Python it is Pythonic.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353