4

I have two lists of over 100,000 tuples in each. The first tuple list has two strings in it, the latter has five. Each tuple within the first list has a tuple with a common value in the other list. For example

tuple1 = [('a','1'), ('b','2'), ('c','3')]
tuple2 = [('$$$','a','222','###','HHH'), ('ASA','b','QWER','TY','GFD'), ('aS','3','dsfs','sfs','sfs')] 

I have a function that is able to remove redundant tuple values and match on the information that is important:

def match_comment_and_thread_data(tuple1, tuple2):
    i = 0
    out_thread_tuples = [(b, c, d, e) for a, b, c, d, e in tuple2] 
    print('Out Thread Tuples Done')
    final_list = [x + y for x in tuple2 for y in tuple1 if x[0] == y[0]]
    return final_list

which ought to return:

 final_list = [('a','1','222','###','HHH'), ('b','2','QWER','TY','GFD'), ('c','3','dsfs','sfs','sfs')]

However, the lists are insanely long. Is there any way to get around the computational time commitment of for loops when comparing and matching tuple values?

JRR
  • 578
  • 5
  • 21
  • 2
    In your example, it seems that it's always the third entry of a tuple2 element that matches (edit) an entry of a tuple1 element. Is that consistent throughout? – jedwards Jun 08 '18 at 03:54
  • 2
    You should convert your seconds list into a dictionary with the common value as key to speed up the finding the right element. – Klaus D. Jun 08 '18 at 03:55
  • @jedwards yes, the placement of the matching items within the tuples are consistent. – JRR Jun 08 '18 at 13:17

2 Answers2

0
tuple1 = [('a','1'), ('b','2'), ('c','3')]
tuple2 = [('$$$','a','222','###','HHH'), ('ASA','b','QWER','TY','GFD'), ('aS','3','dsfs','sfs','sfs')]

def match_comment_and_thread_data(tuple1, tuple2):
    i = 0
    out_thread_dict = dict([(b, (c, d, e)) for a, b, c, d, e in tuple2])
    final_list = [x + out_thread_dict.get(x[0],out_thread_dict.get(x[1])) for x in tuple1]
    return final_list

by using a dictionary instead your lookup time is O(1) ... you still have to visit each item in list1 ... but the match is fast... although you need alot more values than 3 to get the benefits

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • @Jordan Beasley, thank you so much for your suggestion! I think that this is really close to what I am looking for but I seem to be getting the following Traceback on Line 4. Any idea on what might be causing this? All the items in the list are strings. "TypeError: can only concatenate tuple (not "NoneType") to tuple" – JRR Jun 08 '18 at 13:53
  • I understand, is there anyway that I can edit the above function to skip over those tuples in tuples1 that do not have a corresponding value in the dictionary? – JRR Jun 08 '18 at 15:18
0

By using dictionary, this can be done in O(n)

dict1 = dict(tuple1)
final_list =  [(tup[1],dict[tup[1]])+ tup[1:] for tup in tuple2]
Ernie Yang
  • 114
  • 5