Python tuples: compare and merge without for loops

Question

I have two lists of over 100,000 tuples in each. The first tuple list has two strings in it, the latter has five. Each tuple within the first list has a tuple with a common value in the other list. For example

tuple1 = [('a','1'), ('b','2'), ('c','3')]
tuple2 = [('$$$','a','222','###','HHH'), ('ASA','b','QWER','TY','GFD'), ('aS','3','dsfs','sfs','sfs')]

I have a function that is able to remove redundant tuple values and match on the information that is important:

def match_comment_and_thread_data(tuple1, tuple2):
    i = 0
    out_thread_tuples = [(b, c, d, e) for a, b, c, d, e in tuple2] 
    print('Out Thread Tuples Done')
    final_list = [x + y for x in tuple2 for y in tuple1 if x[0] == y[0]]
    return final_list

which ought to return:

 final_list = [('a','1','222','###','HHH'), ('b','2','QWER','TY','GFD'), ('c','3','dsfs','sfs','sfs')]

However, the lists are insanely long. Is there any way to get around the computational time commitment of for loops when comparing and matching tuple values?

In your example, it seems that it's always the third entry of a tuple2 element that matches (edit) an entry of a tuple1 element. Is that consistent throughout? — jedwards, Jun 08 '18 at 03:54
You should convert your seconds list into a dictionary with the common value as key to speed up the finding the right element. — Klaus D., Jun 08 '18 at 03:55
@jedwards yes, the placement of the matching items within the tuples are consistent. — JRR, Jun 08 '18 at 13:17

Joran Beasley · Answer 1 · 2018-06-08T04:08:00.127

0

tuple1 = [('a','1'), ('b','2'), ('c','3')]
tuple2 = [('$$$','a','222','###','HHH'), ('ASA','b','QWER','TY','GFD'), ('aS','3','dsfs','sfs','sfs')]

def match_comment_and_thread_data(tuple1, tuple2):
    i = 0
    out_thread_dict = dict([(b, (c, d, e)) for a, b, c, d, e in tuple2])
    final_list = [x + out_thread_dict.get(x[0],out_thread_dict.get(x[1])) for x in tuple1]
    return final_list

by using a dictionary instead your lookup time is O(1) ... you still have to visit each item in list1 ... but the match is fast... although you need alot more values than 3 to get the benefits

edited Jun 08 '18 at 04:08

answered Jun 08 '18 at 04:02

Joran Beasley

110,522
12
160
179

@Jordan Beasley, thank you so much for your suggestion! I think that this is really close to what I am looking for but I seem to be getting the following Traceback on Line 4. Any idea on what might be causing this? All the items in the list are strings. "TypeError: can only concatenate tuple (not "NoneType") to tuple" – JRR Jun 08 '18 at 13:53
I understand, is there anyway that I can edit the above function to skip over those tuples in tuples1 that do not have a corresponding value in the dictionary? – JRR Jun 08 '18 at 15:18

score 0 · Answer 2 · answered Jun 08 '18 at 04:03

0

By using dictionary, this can be done in O(n)

dict1 = dict(tuple1)
final_list =  [(tup[1],dict[tup[1]])+ tup[1:] for tup in tuple2]

answered Jun 08 '18 at 04:03

Ernie Yang

114
5

Python tuples: compare and merge without for loops

2 Answers2