0

I have a script that creates a directory listing of all pdfs within a certian series of subdirectories. The outputs are tuples that include the year of the file saved as a string as well as an id for the unit that generated the report that looks something like the following:

unit1, 2010
unit2, 2002
unit2, 2005
unit2, 2010 
unit3, 2003 

What I'm looking to now do is create a report that finds the most recent report based on the tuple that contains the max value in its second item. Normally, I would do this in Access with a MAX query, however, I am trying to elimate that step since and write the extract all at once. Using my orginal code, my output would consist of the following:

unit1, '2010'
unit2, '2010'
unit3, '2003'

I did some looking around and realize that I need to change my script so that it would generate a list of the tuples that matched every unique id. Using the great answer I found from Split a list of tuples into sub-lists of the same tuple field I was able to get the results split into a group of sublists. This means my output is now the following:

[[(unit1, '2010')],[(unit12, '2010'), (unit2, '2010'), (unit2, '2005'), (unit2, '2002')],[(unit3, '2003']]

My difficulty now is trying to extract the tuple from each sublist that contains the highest value item. I tried the following:

import glob, os, itertools, operator  
dirtup = []
for f in glob.glob('P:\Office*\Technical*\Bureau*\T*\*\YR2*\R*\*\*.pdf'):
    fpath, fname = os.path.split(f)
    fyr = fpath[91:95]
    vcs = 'Volume'
    rname, extname = os.path.splitext(fname)
    rcid = fname[0:7]
    dirtup.append ((f, fyr, rcid, vcs))

dirtup2 = sorted(dirtup, key=operator.itemgetter(2))

for key, group in itertools.groupby(dirtup2, operator.itemgetter(2)):
    maxval = max(x[1] for x in dirtup2)

print [x for x in dirtup2 if x[1] == maxval] 

This returns only the tuple that match the max of fyr rather then the max of fyr per each sublist.

Edit

Using mglison's first answer I was able to get the output (tuple that contained second item with max value).

Community
  • 1
  • 1
mburkenysdot
  • 15
  • 1
  • 4
  • @selllikesybok thanks for cleaning up the code – mburkenysdot Jul 18 '12 at 16:30
  • I've edited this again to try to make question a little more clear: I like to use print to verify things before I worry about writting the results to file but that means on occasion I can get a bit lost. If I'm passing results between for statments it's not a good idea to get fixated on only one of them. – mburkenysdot Jul 19 '12 at 12:45

1 Answers1

1

You can sort each sublist based on the particular field and take the first element of the sorted sublist.

for key,group in itertools.groupby(dirtup2,operator.itemgetter(2)):
    newlist=sorted(group,key=operator.itemgetter(1),reverse=True)
    tuple_with_max=newlist[0]
    print tuple_with_max
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • I guess I don't understand how the second line's iteration through the sublist works. – mburkenysdot Jul 18 '12 at 19:12
  • @mburkenysdot -- I'm sorry. I don't understand your question. Could you try again? – mgilson Jul 18 '12 at 19:13
  • @mburkenysdot -- I've updated my answer. I'm not quite sure what you're trying to do, but I guessed. Let me know if this does or doesn't work. – mgilson Jul 18 '12 at 19:20
  • I just wanted to say I very much appericate the help you've given me (this is the second question you've provided a helpful answer). – mburkenysdot Jul 18 '12 at 19:30
  • --the appolgies belong on my end. Hopefully I made myself more clear as to what I'm going for in my edits to the orginal question. – mburkenysdot Jul 19 '12 at 12:46
  • @mburkenysdot -- check my edit. My original version looked a little closer to what you wanted so I rolled it back and updated. Does this do what you want? – mgilson Jul 19 '12 at 13:24