-4

2012-05-10 BRAD 10
2012-05-08 BRAD 40
2012-05-08 BRAD 60
2012-05-12 TOM 100
I wanted an output as
2012-05-08 BRAD|2|100
2012-05-10 BRAD|1|10
2012-05-12 TOM|1|100

i started with this code::

import os,sys
fo=open("meawoo.txt","w")
f=open("test.txt","r")
fn=f.readlines()
f.close()
for line in fn:
    line = line.strip()
    sline = line.split("|")
    p = sline[1].split(" ")[0],sline[2],sline[4]
    print p
    fo.writelines(str(p)+"\n")
fo.close()
o_read = open("meawoo.txt","r")
x_read=o_read.readlines()
from operator import itemgetter
x_read.sort(key=itemgetter(0))
from itertools import groupby
z = groupby(x_read, itemgetter(0))
print z
for elt, items in groupby(x_read, itemgetter(0)):
    print elt, items
    for i in items:
        print i

It will be very helpful if u suggest me some usefull changes to my work.TIA
  • Give a more accurate description in words of what you are trying to accomplish. Also you say you have one file but your code opens two files. – James Thiele Oct 05 '12 at 21:10
  • Just tell me if i have a file with data as:: 2012-05-10 BRAD 6 2012-05-10 BRAD 4 2012-05-08 BRAD 20 How would i get this:: 2012-05-08 BRAD|1|20 2012-05-10 BRAD|2|10 i.e Groupby DATE Groupby NAME|Len(NAME)|SUM(VALUES) – user1720510 Oct 06 '12 at 07:18

1 Answers1

3

The following code should print the data in your wanted format (as far as I understand it):

d = {}
with open("testdata.txt") as f:
    for line in f:
        parts = line.split()
        if parts[0] in d:
            if parts[1] in d[parts[0]]:
                d[parts[0]][parts[1]][0] += int(parts[2])
            else:
                d[parts[0]][parts[1]] = [int(parts[2]), 0]
            d[parts[0]][parts[1]][1] +=1
        else:
            d[parts[0]] = {parts[1]: [int(parts[2]), 1]}
    for date in sorted(d):
        for name in sorted(d[date]):
            print "%s %s|%d|%d" % (date, name, d[date][name][0], d[date][name][1])

I save every line in a dictionary with the lines' dates as keys, and the value is another dictionary with the name as the key and the value is a list with two elements: The first is the cumulative sum of the numbers of this name on this date up to this line, and the second is the number of summands for this date/name constellation. I then print the dictionary in your demanded format and use the circumstance that the comparison of two dates gives the same result as the comparison of the dates as strings that have the format YYY-MM-DD, so I can just use the sorted function on the date strings. I sort on names too.

For an example (adapted to not being able to use a file) see http://ideone.com/rx3h2. It gives the same output you demanded.

halex
  • 16,253
  • 5
  • 58
  • 67