7

I have a pandas dataframe with a column containing timestamps (start) and another column containing timedeltas (duration) to indicate duration.

I'm trying to plot a bar chart showing these durations with their left edge at the timestamps. I haven't found anyway online of doing it. Is there any way to achieve this?

So far, this is what I have, which doesn't work:

    height = np.ones(df.shape[0])
    width = [x for x in df['duration']]
    plt.bar(left=df['start'], height=height, width=width)

Edit: I have updated the width as follows but that also doesn't solve this problem:

width = [x.total_seconds()/(60*1200) for x in df['duration']]

I'm interested in knowing whether datetime.timedelta objects can be used in width, since datetime objects can be used as x-axis. And if not, what alternatives are there?

Edit #2:

This may not be the exact answer to my question, but it solved the purpose I had in mind. For whoever interested, this is the approach I took finally (I used start and duration to make end for this purpose):

    for i in range(df.shape[0]):
        plt.axvspan(df.ix[i, 'start'], df.ix[i, 'end'], facecolor='g', alpha=0.3)
        plt.axvline(x=df.ix[i, 'start'], ymin=0.0, ymax=1.0, color='r', linewidth=1)
        plt.axvline(x=df.ix[i, 'end'], ymin=0.0, ymax=1.0, color='r', linewidth=1)
oxtay
  • 3,990
  • 6
  • 30
  • 43

1 Answers1

3

If type of your df.duration[0] is pandas.tslib.Timedelta and your timestamps are days apart you could use:

width = [x.days for x in df.duration]

and this will produce the chart.

Otherwise use total_seconds method as outlined in this answer

UPDATE:

If the data is hourly with timedeltas in minutes then one way to have the chart you want is like this:

import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dates = pd.date_range(start=dt.date(2014,10,22), periods=10, freq='H')
df = pd.DataFrame({'start': dates, 'duration': np.random.randint(1, 10, len(dates))}, 
                  columns=['start', 'duration'])
df['duration'] = df.duration.map(lambda x: pd.datetools.timedelta(0, 0, 0, 0, x))
df.ix[1, 1] = pd.datetools.timedelta(0, 0, 0, 0, 30) # To clearly see the effect at 01:00:00
width=[x.minutes/24.0/60.0 for x in df.duration] # mpl will treat x.minutes as days hense /24/60.
plt.bar(left=df.start, width=width, height=[1]*df.start.shape[0])
ax = plt.gca()
_ = plt.setp(ax.get_xticklabels(), rotation=45)

This produces a chart like this:

enter image description here

Community
  • 1
  • 1
Primer
  • 10,092
  • 5
  • 43
  • 55
  • Thank you. Actually, they are only hours or minutes apart and I've seen the other answer with `total_seconds`, but it won't work. The reason is you have to scale it to hours and minutes but since the length of those times are different, this scaling will not be accurate and has to be manually adjusted every time. So right now, the width is `width = [x.total_seconds()/(60*1200) for x in df['duration']] Since `Pyplot` handles `datetime` objects for x-axis, I was expecting the `datetime.timedelta` to be recognized and handled for width, and I'm surprised it isn't. – oxtay Oct 22 '14 at 16:29
  • @oxtay Yes, it would be much nicer if `matplotlib` would natively understand TimeDeltas or Offsets in both `bar` and `broken_barh` plots, but until then I see only somewhat cumbersome ways to achieve what you want. I have updated my post above to show example with hours / minutes – Primer Oct 22 '14 at 21:24
  • @Primer Be aware this doesn't work out of the box because you're missing the datetime import for the declaration of `dates`. There also may be some sort of pandas version trickiness, because I end up with `numpy.timedelta64`s which don't have the same methods that `datetime.timedelta`s do... – Ajean Oct 22 '14 at 23:22
  • @Ajean Thanks, I added the missing import statement. And, yes, I also noted the pandas "trickness" with regards to `pandas.tslib.Timedelta` (which it initially was when dataframe was created). That's why `width` is made through list comprehension. "Native" approach with `df.duration.apply(lambda x: x.minutes)` doesn't "see" `pandas.tslib.Timedelta`, thus giving error: `'numpy.timedelta64' object has no attribute 'minutes'`. – Primer Oct 23 '14 at 11:40