4

We've got a set of recarrays of data for individual days - the first attribute is a timestamp and the rest are values.

Several of these:

    ts             a    b    c
2010-08-06 08:00, 1.2, 3.4, 5.6
2010-08-06 08:05, 1.2, 3.4, 5.6
2010-08-06 08:10, 1.2, 3.4, 5.6
2010-08-06 08:15, 2.2, 3.3, 5.6
2010-08-06 08:20, 1.2, 3.4, 5.6

We'd like to produce an array of the averages of each of the values (as if you laid all of the day data on top of each other, and averaged all of the values that line up). The timestamp times all match up, so we can do it by creating a result recarray with the timestamps, and the other columns all 0s, then doing something like:

for day in day_data:
    result.a += day.a
    result.b += day.b
    result.c += day.c

result.a /= len(day_data)
result.b /= len(day_data)
result.c /= len(day_data)

It seems like a better way would be to convert each day to a 2d array with just the numbers (lopping off the timestamps), then average them all element-wise in one operation, but we can't find a way to do this - it's always a 1d array of objects.

Does anyone know how to do this?

babbageclunk
  • 8,523
  • 1
  • 33
  • 37

1 Answers1

8

There are several ways to do this. One way is to select multiple columns of the recarray and cast them as floats, then reshape back into a 2D array:

new_data = data[['a','b','c']].astype(np.float).reshape((data.size, 3))

Alternatively, you might consider something like this (negligibly slower, but more readable):

new_data = np.vstack([data[item] for item in ['a','b','c']]).T

Also note that it might be a good idea to look into pandas for operations such as these so that you can easily work with heterogeneous data.

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • 2
    That's great, thanks! I'm still struggling to get used to doing things on the arrays as a whole - my instinct is to do things to elements individually. One note from my testing - while the .view(np.float) part doesn't make a copy, the fancy slicing does. – babbageclunk Aug 12 '10 at 10:02
  • 1
    @Joe: If I'm not mistaken, @wilberforce is right about the copy: `data[['a','b','c']].base` is None, so this means that it owns its data and does not inherit it from `data`. This makes sense, as the fields are generally not contiguous. If you confirm this, it would be nice to update your answer. :) – Eric O. Lebigot Jul 29 '13 at 09:32
  • @EOL - You're absolutely right! (I don't know what I was thinking at the time...) – Joe Kington Jul 30 '13 at 01:49
  • 1
    @EOL - Also, indexing structured arrays with things like `data[['a', 'b', 'c']]` will return a view in future versions of numpy: https://github.com/numpy/numpy/pull/350/files As you mentioned, it doesn't at the moment, and hasn't in the past, though. – Joe Kington Jul 30 '13 at 02:07