Group data into time dependent sets using numpy

Question

Given a 2D set of data [Time, Value] I'd like to split it into like groups but in time ordered chunks. I am using both numpy and pandas already so a method for either is great.

Original:

Foo = np.array([[0,A],[1,A],[2,A],[3,B],[4,B]
               [5,A],[6,A],[7,B],[8,B],[9,B],[10,A]....]])

Split into:

bar = np.array([[0,A],[1,A],[2,A]])
baz = np.array([[3,B],[4,B]])
qux = np.array([[5,A],[6,A]])
arr = np.array([[7,B],[8,B],[9,B]])
wiz = np.array([[10,A],......]])

Are `A` and `B` supposed to be strings, or actual values? Can you provide an example value for them? If they are values, note that the final result will not print out as you have it at the bottom: the actual value for `A` or `B` would be displayed, not their symbolic variable names. — ely, Feb 21 '15 at 23:07

score 1 · Accepted Answer · edited May 23 '17 at 12:22

Assuming that you mean for A and B to be values, you can just use itertools.groupby if it's the case that your grouping logic is to place contiguous sequences of each value into different groups.

Concretely (including fixing a bracket and comma error in your example code, and adding some dummy values for A and B):

A = 1.0
B = 2.0
Foo = np.array([[0,A],[1,A],[2,A],[3,B],[4,B],
               [5,A],[6,A],[7,B],[8,B],[9,B],[10,A]])

from itertools import groupby
groups = [np.array(list(v)) for k,v in groupby(Foo, lambda x: x[1])]

Now what you call bar will be groups[0], and so on. If you want to give them names automatically, it's advisable not to try to do this at the top level with some kind of locals() or globals() trickery, but instead just list out the names and use a dict:

names = ['bar', 'baz', 'qux', 'arr', 'wiz']
named_groups = {names[i]:groups[i] for i in range(len(groups))}

Now named_groups['bar'] returns what you used to just call bar.

Alternatively, if you can guarantee the precise number of groups, you can use tuple unpacking to name them all in one step like this:

(bar,
 baz,
 qux,
 arr,
 wiz) = [np.array(list(v)) for k,v in itertools.groupby(Foo, lambda x: x[1])]

(Note: I've never gotten a great answer about what PEP 8 might say about best practice for needing to have a lot of (possibly verbosely named) tuple elements to unpack on the left side of =)

This still lets you have the groups bound to top-level variable names, but rightfully forces you to be explicit about how many such variables there are, avoiding the bad practice of trying to dynamically assign variables on the fly.

Is an argument to ignore some sets? For example if the iteration finds other value C (yes these are all numerical values), skip it and continue grouping only A's and B's. — user3427374, Feb 22 '15 at 16:55
maybe a better question is, where can I learn about manipulating this kind of beast. have we created a list? 3D numpy array? I'm not sure what to research.. I can access each individual array like group[0], but I have 120 of them that I'd like to operate on and I only know how to access them manually. — user3427374, Feb 22 '15 at 18:34
If you have some object named `x` you can inspect what type it is with `type(x)` and go from there. If you have a `list`, why not write your processing logic as a separate function that takes a single group as an argument, and then use `map` to spray it across the list of groups? If you store your data as a `pandas` `DataFrame` then you can set the column of `A`/`B` values as the index, or group by that column (which will be a bit different than the contiguous-runs grouping logic above, which is why I opted not to use `pandas` in my answer) ... there are many possibilities. — ely, Feb 22 '15 at 19:17

Group data into time dependent sets using numpy

1 Answers1