0

i have a list

like this

myList = [ ['jan','423','523','645'],
            ['jan','654','754','765'],
            ['nov','756','087','140'],
            ['nov','233','123','032']
            ['apr','654','786','223'] ]

i want to go through the list and group all the sublist (of myList) based on 1st elemnt (i.e 'jan','nov') in a dict. whose key is also the 1st element of the list.

myDict = {'jan':[ 
                ['423','654'], # average
                ['523','754'], # average
                ['645','765'] # average
                ],
        'nov':[ 
                ['756','233'], # average
                ['087','123'], # average
                ['140','032'] # average
                ],
        'apr':[ 
                ['654'], # average
                ['786'], # average
                ['223'] # average
                ],
        }

Then i want to calculate the average of all the list elements of that dict.

like this

myDict = {'jan':[ 
                ['423','654'], # average
                ['523','754'], # average
                ['645','765'] # average
                ],.....

NOTE: this is just the sample data, i have hundreds of elemnts in sublist like [[34,654,756,8,675,75,64,3,45,....n],[sublist-2..n],[sublist-3....n]] but the lenght of each sublist is fixed.

MYCODE:

myList
db = {}    
for i in myList:

    human = i[0]    
    newlist = i[1:]

    # print pprint(db)
    columns = []
    counter = 0

    while counter < len(newlist):
        if db.has_key(human):
            db[human][counter].append(newlist[counter])
        else:
            columns.append([newlist[counter]])
            db[human] = columns
        counter += 1

the BELOW listed code worked well when i had only 2 items.

human = i[0]
col_1 = i[1]
col_2 = i[2]

if db.has_key(human):
    db[human][0].append(col_1)
    db[human][1].append(col_2)
else:
    db[human] = [ [col_1], [col_2] ]

print
pprint(db)
print columns

# function to calculate average
def getAvg(column):
    total = 0
    average = 0
    for val in column:
        length = len(column)
        int_val = float(val)
        total += int_val
        average = total / length
    return average

col_1_avg = getAvg(col_1)
col_2_avg = getAvg(col_2)

result = human + ',' + str(getAvg(col_1)) + ',' +str(getAvg(col_2))
ArrC
  • 203
  • 1
  • 3
  • 11
  • Look at http://stackoverflow.com/questions/21674331/group-by-multiple-keys-and-summarize-average-values-of-a-list-of-dictionaries/21674941#21674941 – tk. Feb 10 '14 at 10:59
  • Please don't use `dict.has_key` to check existence of a key, it has been deprecated. Use `if k in my_dict:...` – Ashwini Chaudhary Feb 10 '14 at 11:05

1 Answers1

0
  myList = [['jan','423','523','645'],
            ['jan','654','754','765'],
                ['nov','756','087','140'],
                ['nov','233','123','032'],
                ['apr','654','786','223']]

 from collections import defaultdict
 mydict = defaultdict(list)
 for each in myList:
     mydict[each[0]].append(each[1:])
 datadict = {}
 for each in mydict:
     datadict[each] = zip(*mydict[each])
 print datadict
Smita K
  • 94
  • 5