This is what my dataframe looks like:
Timestamp CAT
0 2016-12-02 23:35:28 200
1 2016-12-02 23:37:43 200
2 2016-12-02 23:40:49 300
3 2016-12-02 23:58:53 400
4 2016-12-02 23:59:02 300
...
This is what I'm trying to do in Pandas (notice the timestamps are grouped):
Timestamp BINS 200 300 400 500
2016-12-02 23:30 2 0 0 0
2016-12-02 23:40 0 1 0 0
2016-12-02 23:50 0 1 1 0
...
I'm trying to create bins of 10-minute time intervals so I can make a bar graph. And have the columns as the CAT values, so I can have a count of how many times each CAT occurs within that time bin.
What I have so far can create the time bins:
def create_hist(df, timestamp, freq, fontsize, outfile):
""" Create a histogram of the number of CATs per time period."""
df.set_index(timestamp,drop=False,inplace=True)
to_plot = df[timestamp].groupby(pandas.TimeGrouper(freq=freq)).count()
...
But my issue is I cannot for the life of me figure out how to group by both the CATs and by time bins. My latest try was to use df.pivot(columns="CAT")
before doing the groupby but it just gives me errors:
def create_hist(df, timestamp, freq, fontsize, outfile):
""" Create a histogram of the number of CATs per time period."""
df.pivot(columns="CAT")
df.set_index(timestamp,drop=False,inplace=True)
to_plot = df[timestamp].groupby(pandas.TimeGrouper(freq=freq)).count()
...
Which gives me: ValueError: Buffer has wrong number of dimensions (expected 1, got 2)