2

I want to return a list of tuples with unique ID's but specifically keep the tuple with the most recent date.

The unique ID is in the first element of each tuple (i.e. 1,2,3,4).

The dates exist in more than one element of each tuple (3rd element & 6th element of tuple).

a = [(1,'Y', 'rat', datetime.datetime(2016, 12, 12, 0, 0), 'N', None),
(2,'Y', 'ox', datetime.datetime(2017, 9, 4, 0, 0), 'N', None),
(1,'N', None, None, 'Y', datetime.datetime(2017, 9, 17, 0, 0)),
(2,'N', None, None, 'Y', datetime.datetime(2017, 3, 16, 0, 0)),
(3,'Y', 'tiger', datetime.datetime(2013, 1, 18, 0, 0), 'N', None),
(4,'N', None, None, 'Y', datetime.datetime(2017, 10, 3, 0, 0))]

The output I am expecting is:

b = [(1,'N', None, None, 'Y', datetime.datetime(2017, 9, 17, 0, 0)),
(2,'Y', 'ox', datetime.datetime(2017, 9, 4, 0, 0), 'N', None),
(3,'Y', 'tiger', datetime.datetime(2013, 1, 18, 0, 0), 'N', None),
(4,'N', None, None, 'Y', datetime.datetime(2017, 10, 3, 0, 0))]

I've put the tuples into a dictionary and sorted using groupby.

from itertools import groupby
dict={}
f = lambda x: x[0]
for key, group in groupby(sorted(a, key=f),f):
    dict[key] = list(group)

This is the dictionary output:

{1: [(1, 'Y', 'rat', datetime.datetime(2016, 12, 12, 0, 0), 'N', None), 
(1, 'N', None, None, 'Y', datetime.datetime(2017, 9, 17, 0, 0))], 
2: [(2, 'Y', 'ox', datetime.datetime(2017, 9, 4, 0, 0), 'N', None), 
(2, 'N', None, None, 'Y', datetime.datetime(2017, 3, 16, 0, 0))], 
3: [(3, 'Y', 'tiger', datetime.datetime(2013, 1, 18, 0, 0), 'N', None)], 
4: [(4, 'N', None, None, 'Y', datetime.datetime(2017, 10, 3, 0, 0))]}

From this step I am having trouble extracting the dictionary values that I want into a new list.

Thanks for your help in advance!

mfn
  • 123
  • 1
  • 1
  • 8
  • 1
    Welcome to stackoverflow! You are much more likely to get assistance from the community if you provide a snippet of code that you wrote attempting to solve the problem. – Maciej Dec 20 '17 at 21:38
  • what do you mean by "keep the tuple with most recent date"? keep it where? – sam-pyt Dec 20 '17 at 21:41
  • this might be closely related : https://stackoverflow.com/questions/3922644/find-oldest-youngest-datetime-object-in-a-list#3922675 – jmunsch Dec 20 '17 at 21:41
  • @sam-pyt I want to put the tuples into a new list - sorry for not being clear, I will edit my question to make it clearer. – mfn Dec 20 '17 at 21:49
  • mfn, are your `id`s small numbers or can they be anything? – Paul Panzer Dec 20 '17 at 22:01
  • @PaulPanzer the id's are always integers, up to 9 digits long – mfn Dec 20 '17 at 22:10
  • That's a teeny bit too large for what I had in mind :( – Paul Panzer Dec 20 '17 at 22:11

3 Answers3

4

First, you could define a function to get the datetime from the tuples, regardless of its position. Then you could sort the list in reverse by id and datetime, group by ID, get the next entry, and sort again (so it's sorted by ID).

>>> getdate = lambda t: next(x for x in t if isinstance(x, datetime.datetime))
>>> sorted(next(g) for k, g in itertools.groupby(sorted(a, key=lambda t: (t[0], getdate(t)), reverse=True), key=lambda t: t[0]))
[(1, 'N', None, None, 'Y', datetime.datetime(2017, 9, 17, 0, 0)),
 (2, 'Y', 'ox', datetime.datetime(2017, 9, 4, 0, 0), 'N', None),
 (3, 'Y', 'tiger', datetime.datetime(2013, 1, 18, 0, 0), 'N', None),
 (4, 'N', None, None, 'Y', datetime.datetime(2017, 10, 3, 0, 0))]

Or a bit shorter, sorting just once by ID and then getting the max by date; same result:

>>> [max(g, key=getdate) for k, g in itertools.groupby(sorted(a), key=lambda t: t[0])]

Of course, the same would also be possible (and faster) with a simple loop and a dictionary...

d = dict()
for t in a:
    if t[0] not in d or getdate(d[t[0]]) < getdate(t):
        d[t[0]] = t

...but hey, nothing beats an overcomplicated one-liner!

tobias_k
  • 81,265
  • 12
  • 120
  • 179
0

Up to me, you need to write custom code. There is no builtin function in python to do what you want to achieve.

You can use classical python code or more data oriented libraries such as Pandas.

The main idea is this one

result = dict()

for item in a:
  if item[0] not in result:
     result[item[0]] = ...
  else:
     if result[item[0]][5] < item[5]:
        result[item[0]] = ...

I don't do the details, this is just the global and generic idea.

Jonathan DEKHTIAR
  • 3,456
  • 1
  • 21
  • 42
  • The date is not always in position `[5]`. Feel free to use my `getdate` function to combine it with your dict approach if you like. – tobias_k Dec 20 '17 at 21:51
  • I am not planning to do the job for you. I just wan to give you the necessary knowledge or a feasible approach to answer your need. – Jonathan DEKHTIAR Dec 22 '17 at 13:27
0

You can try this:

import datetime
import itertools
a = [(1,'Y', 'rat', datetime.datetime(2016, 12, 12, 0, 0), 'N', None),
 (2,'Y', 'ox', datetime.datetime(2017, 9, 4, 0, 0), 'N', None),
 (1,'N', None, None, 'Y', datetime.datetime(2017, 9, 17, 0, 0)),
 (2,'N', None, None, 'Y', datetime.datetime(2017, 3, 16, 0, 0)),
 (3,'Y', 'tiger', datetime.datetime(2013, 1, 18, 0, 0), 'N', None),
 (4,'N', None, None, 'Y', datetime.datetime(2017, 10, 3, 0, 0))]
new_s = [d for c, d in [(a, sorted(list(b), key=lambda x:[h for h in x if type(h) == type(datetime.datetime(2017, 9, 17, 0, 0))][0], reverse=True)[0]) for a, b in itertools.groupby(sorted(a, key=lambda x:x[0]), key=lambda x:x[0])]]

Output:

[(1, 'N', None, None, 'Y', datetime.datetime(2017, 9, 17, 0, 0)), (2, 'Y', 'ox', datetime.datetime(2017, 9, 4, 0, 0), 'N', None), (3, 'Y', 'tiger', datetime.datetime(2013, 1, 18, 0, 0), 'N', None), (4, 'N', None, None, 'Y', datetime.datetime(2017, 10, 3, 0, 0))]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102