0

I have emails and dates. I can use 2 nested for loops to choose emails sent on same date, but how can i do it 'smart way' - efficiently?

# list of tuples - (email,date)

for entry in list_emails_dates:
    current_date = entry[1]
    for next_entry in list_emails_dates:
        if current_date = next_entry[1]
        list_one_date_emails.append(next_entry)

I know it can be written in shorter code, but I don't know itertools, or maybe use map, xrange?

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
ERJAN
  • 23,696
  • 23
  • 72
  • 146

2 Answers2

2

You can just convert this to a dictionary, by collecting all emails related to a date into the same key.

To do this, you need to use defaultdict from collections. It is an easy way to give a new key in a dictionary a default value.

Here we are passing in the function list, so that each new key in the dictionary will get a list as the default value.

emails = defaultdict(list)
for email,email_date in list_of_tuples:
    emails[email].append(email_date)

Now, you have emails['2013-14-07'] which will be a list of emails for that date.

If we don't use a defaultdict, and do a dictionary comprehension like this:

emails = {x[1]:x[0] for x in list_of_tuples}

You'll have one entry for each date, which will be the last email for that that, since assigning to the same key will override its value. A dictionary is the most efficient way to lookup a value by a key. A list is good if you want to lookup a value by its position in a series of values (assuming you know its position).

If for some reason you are not able to refactor it, you can use this template method, which will create a generator:

def find_by_date(haystack, needle):
    for email, email_date in haystack:
        if email_date == needle:
            yield email

Here is how you would use it:

>>> email_list = [('foo@bar.com','2014-07-01'), ('zoo@foo.com', '2014-07-01'), ('a@b.com', '2014-07-03')] 
>>> all_emails = list(find_by_date(email_list, '2014-07-01'))
>>> all_emails
['foo@bar.com', 'zoo@foo.com']

Or, you can do this:

>>> july_first = find_by_date(email_list, '2014-07-01')
>>> next(july_first)
'foo@bar.com'
>>> next(july_first)
'zoo@foo.com'
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
2

I would do an (and it's good to try using itertools)

itertools.groupby(list_of_tuples, lambda x: x[1])

which gives you the list of emails grouped by the date (x[1]). Note that when you do it you have to sort it regarding the same component (sorted(list_of_tuples, lambda x: x[1])).

One nice thing (other than telling the reader that we do a group) is that it works lazily and, if the list is kind of long, its performance is dominated by n log n for the sorting instead of n^2 for the nested loop.

fricke
  • 1,330
  • 1
  • 11
  • 21