0

I have a long list with some values. I want to define a function that take the list and calculates the average for every 24 values in the list, and returns the average values as a list. How do I do this? I have 8760 elements in the list, and the list returned should give 8760/24=365 elements.

hourly_temp = ['-0.8', '-0.7', '-0.3', '-0.3', '-0.8',
'-0.5', '-0.7', '-0.6', '-0.7', '-1.2', '-1.7...] #This goes on, it's 8760 elements

def daily_mean_temp(hourly_temp):

    first_24_elements = hourly_temp[:24] #First 24 elements in the list

Is this correct? I get an error saying: TypeError: cannot perform reduce with flexible type

def daily_mean_temp(hourly_temp):
averages = [float(sum(myrange))/len(myrange) 
        for myrange in zip(*[iter(hourly_temp)]*24)]
return averages
user3297266
  • 45
  • 1
  • 6

5 Answers5

2

Assuming that you want independent groups, you can use the grouper itertools recipe:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

And then easily get the average of each group:

averages = [sum(group)/float(len(group)) for group in grouper(data, 24)]

Edit: given that your data appears to be a list of strings, I would suggest you convert to floats first using map:

data = map(float, hourly_temp)
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
1
averages = [sum( map(float, myrange) )/len(myrange) 
            for myrange in zip(*[iter(my_big_list)]*range_size)]

is a pretty neat way to do it ... note that it will truncate any end variables not nicely divisible by the range size

if you need to have uneven lists at the end (ie chunk_size of 10 with a big_list of 17 would have 7 left over)

 from itertools import izip_longest as zip2
 averages = [sum(map(float,filter(None,myrange)))/len(filter(None,myrange)) 
            for myrange in zip2(*[iter(my_big_list)]*range_size)]
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • 1
    I'd rather not use `range`, a name of a built-in function, as a variable name, though _here_ its usage does not lead to problems. – 9000 Mar 13 '14 at 17:56
  • Since the list comprehension leaks the `range` variable into the surrounding scope, it'll cause problems if the surrounding code needs the `range` function. – user2357112 Mar 13 '14 at 17:58
  • very valid criticism (was not thinking)... changed :) – Joran Beasley Mar 13 '14 at 17:58
  • 1
    Although a bit strange, an alternative if you wanted to handle uneven chunks (or just not use the `grouper` recipe for a change :p), then you can use: `[sum(el) / len(el) for el in iter(lambda it=iter(my_big_list): list(islice(it, range_size)), [])]` – Jon Clements Mar 13 '14 at 18:02
  • Note that an uneven block using `izip_longest` will pad with `None` and thus will cause the `sum()` to fail... You could change `fillvalue=` to be 0, but that'll distort the average. The best you could do to make it correct would be to filter None objects from groups – Jon Clements Mar 13 '14 at 18:07
  • whatever size you want your sub blocks to be? 10 readings? 25 readings? 100 readings? etc... – Joran Beasley Mar 13 '14 at 18:28
  • I get an error saying: TypeError: cannot perform reduce with flexible type – user3297266 Mar 13 '14 at 19:12
  • `float(sum(myrange))` is probably not what you want. – njzk2 Mar 14 '14 at 15:59
  • why not @njzk2? if myrange is all ints then the sum would be ints ... and len is always an int ... so int divided by int = int (at least in py2) oh when I answered this I swear his initial list was actually numbers ... (or I guess he didnt provide any sample code) – Joran Beasley Mar 14 '14 at 16:45
  • @JoranBeasley : my understanding is that `my_big_list` contains elements like `'-0.8', '-0.7', ...`, hence the `sum(map(float, ...))` – njzk2 Mar 14 '14 at 17:09
  • yeah I changed it (when I answered this, OP had no example code and it sounded like a list of actual numbers) – Joran Beasley Mar 14 '14 at 18:31
1

Assuming your values are strings, as you show above, and that you have NumPy handy, this should be fast:

import numpy as np
averages = [x.mean() for x in np.array_split(
        [float(x) for x in hourly_temp], 365)]

And if you might have NaNs:

averages = [x[~np.isnan(x)].mean() for x in np.array_split(
        [float(x) for x in hourly_temp], 365)]

And if you start with proper floats:

averages = [x[~np.isnan(x)].mean() for x in np.array_split(hourly_temp, 365)]
flexatone
  • 149
  • 1
  • 6
0

Something along these lines seems to work:

[ sum(hourly_temp[i:i+24]) / len(hourly_temp[i:i+24]) for i in xrange(0, len(hourly_temp), 24) ]
twalberg
  • 59,951
  • 11
  • 89
  • 84
  • `len(hourly_temp[i:i+24])` is quite likely to be 24. – njzk2 Mar 14 '14 at 15:58
  • @njzk2 except when you hit the end of the list, if there wasn't a multiple of 24 items in the original list... – twalberg Mar 14 '14 at 16:11
  • true. I was assuming `I have 8760 elements in the list` is always true, but covering the larger case is indeed better. – njzk2 Mar 14 '14 at 16:12
0

Using this grouper recipe, it's pretty easy (obviously, I've synthesized the temps list):

#!/usr/bin/python

import itertools as it

temps = range(96)

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return it.izip_longest(*args, fillvalue=fillvalue)

daily_averages = [sum(x)/len(x) for x in grouper(temps, 24)]
yearly_average = sum(daily_averages)/len(daily_averages)

print(daily_averages, yearly_average)
Community
  • 1
  • 1
Emmet
  • 6,192
  • 26
  • 39