0

I want to compute the average number of elements that separate all possible pairs in a list of lists. The following script works nicely

from itertools import combinations
from operator import itemgetter
from collections import defaultdict

lst = [['A','D','B',],['A','M','N','B'],['A','C','B']]
elms = set(x for l in lst for x in l)

def test1():
    d = defaultdict(list)
    for i in lst:
        combs = list(combinations(i, 2))
        combs_sorted = [sorted(i) for i in combs]
        for j in combs_sorted:
            a = i.index(j[0])
            b = i.index(j[1])
            d[tuple(j)].append(abs((a+1)-b))
    return(d)

d = test1()
d = {k: sum(v)/len(v) for k, v in d.items()}
for k,v in d.items():
    print(k,v)

and the result is the desired one.

('A', 'D') 0.0
('A', 'B') 1.3333333333333333
('B', 'D') 2.0
('A', 'M') 0.0
('A', 'N') 1.0
('M', 'N') 0.0
('B', 'M') 3.0
('B', 'N') 2.0
('A', 'C') 0.0
('B', 'C') 2.0

However, that script is quite slow when the number of lists and elements grow considerably. I tried to use multiprocessing following this answer

import multiprocess as mp

def init2(child_conn):
    d = defaultdict(list)
    for i in lst:
        combs = list(combinations(i, 2))
        combs_sorted = [sorted(i) for i in combs]
        for j in combs_sorted:
            a = i.index(j[0])
            b = i.index(j[1])
            d[tuple(j)].append(abs((a+1)-b))
    child_conn.send(d)

def test2():
    parent_conn, child_conn = mp.Pipe(duplex=False)
    p = mp.Process(target=init2, args=(child_conn,))
    p.start()
    d = parent_conn.recv()
    p.join()
    return(d)

d = test1()
d = {k: sum(v)/len(v) for k, v in d.items()}
for k,v in d.items():
    print(k,v)

but this script seems to be even slower than the previous one.

import time

t = time.process_time()
test1()
print(time.process_time() - t)

6.0000000000004494e-05

t = time.process_time()
test2()
print(time.process_time() - t)

0.017596

How can I speed up this calculation?

pppery
  • 3,731
  • 22
  • 33
  • 46
dcirillo
  • 171
  • 7

1 Answers1

0
  1. Unless it is just for illustration toy example, I wonder why you bother to accelerate a 60 usec calculation.
  2. You are opening only one child process that does all the work so no performance gain should be expected.
  3. Even if you open more, the overhead of opening the multiprocessing + the Pipe is much bigger comparing to the tiny 60 usec of your calculation.

  4. Using multiprocessing is effective when:

  5. your base calculation is much slower than the mp overhead.
  6. when you have a pre-created workers-pool that waiting for communication to do some calculation. With this configuration (mostly found on servers) you only paying for the communication (which, by the way, are longer too than your 60 usecs).

So in bottom line, for such short calculation, stay with one process.

Lior Cohen
  • 5,570
  • 2
  • 14
  • 30