0

I am trying to plot a simple date histogram(just count occurences per N minutes) using pandas. However the best I have reach is the following:

import pandas as pd
df = pd.read_csv('mydata.csv',sep=' ',usecols=[0,1])
df.columns = ['smdate', 'smtime']
df= pd.to_datetime(df['smtime'])
print(df)
df.groupby(df.dt.minute).count().plot(kind="bar",figsize=(50,10))

Sample output:

0        2020-07-07 00:00:07.538
1        2020-07-07 00:00:09.278
2        2020-07-07 00:00:09.292
3        2020-07-07 00:00:10.682
4        2020-07-07 00:00:14.198
                   ...          
262516   2020-07-07 15:54:44.056
262517   2020-07-07 15:54:44.270
262518   2020-07-07 15:54:44.450
262519   2020-07-07 15:54:44.697
262520   2020-07-07 15:54:45.210
Name: smtime, Length: 262521, dtype: datetime64[ns]

Plot enter image description here

My question is how can I group my frame best(i.e. by 10-20 minutes)? Also what is the best way to plot so many occurrences so that I can visualize them per the interval mentioned above(per N minutes)?

trikelef
  • 2,192
  • 1
  • 22
  • 39
  • I don't have a pandas answer, but you could probably make a new column with some value indicating which interval (ie 1:first interval, 2:next interval and so on) and groupby as such. as far as visualising I'm not sure but once you get it grouped out it should be easy to arrive at a solution. – nos codemos Jul 07 '20 at 11:40

1 Answers1

2

I think you are looking for pandas Grouper.
It allows you to specify any frequency or interval needed.

Here is a working example with 10 minutes interval :

import pandas as pd
df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
df.columns = ['smdate', 'smtime']

df['smtime'] = pd.to_datetime(df['smtime'])

df.groupby(pd.Grouper(key='smtime', freq='10Min')).count().plot(kind="bar",figsize=(50,10))

Here, I kept the initial dataframe structure ; I couldn't get it to work with the datetime Series object (Grouper function tries to work on index and not values of the serie). I tried axis parameter without success. I would be glad if anyone could improve my answer working directly with the Series.

Not working example :

import pandas as pd
df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
df.columns = ['smdate', 'smtime']

df = pd.to_datetime(df['smtime'])

df.groupby(pd.Grouper(freq='10Min')).count().plot(kind="bar",figsize=(50,10))
chateaur
  • 346
  • 1
  • 13