3

I have a list of dates that span several (hundred) years. I'd like to make a histogram that has 366 buckets, one for each day of the year, with the x-axis labelled in a legible way that allows me to see which date is which (I'm expecting a dip for February 29, for example).

I've made the following histogram, but easy-to-read X-axis date labels would be awesome. The following code seems cumbersome but gets me what I want (without the X-axis labels):

from datetime import date, datetime, timedelta
from collections import Counter
import pylab


def plot_data(data):
    """data is a list of dicts that contain a field "date" with a datetime."""

    def get_day(d):
        return d.strftime("%B %d")  # e.g. January 01

    days = []
    n = 366
    start_date = date(2020, 1, 1)  # pick a leap year
    for i in range(n):
        d = start_date + timedelta(days=i)
        days.append(get_day(d))

    counts = Counter(get_day(d['date']) for d in data)
    
    Y = [counts.get(d) for d in days]
    X = list(range(len(days)))

    pylab.bar(X, Y)
    pylab.xlim([0, n])

    pylab.title("Dates day of year")
    pylab.xlabel("Day of Year (0-366)")
    pylab.ylabel("Count")
    pylab.savefig("Figure 1.png")

Day of Year Chart with bad labels

Any help to shorten this up and make for more flexible and legible x-axis dates would be much appreciated!


UPDATE

I've incorporated the ideas below into the following gist, which produces output that looks like this:

Day of Year Chart with nice labels

Jason Sundram
  • 12,225
  • 19
  • 71
  • 86
  • 2
    You are not going to fit 366 strings on the x-axis (even with `ax.set_xticklabels([str(d) for d in days], rotation='vertical')`) . Which dates / breaks in the data do you care about? – Paul Brodersen Jun 15 '20 at 16:12
  • @PaulBrodersen -- great point! -- I think adding monthly labels with weekly breaks inside of them is along the lines of what I'm looking for. – Jason Sundram Jun 15 '20 at 19:19

2 Answers2

2

Try to check this code:

# import section
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
from datetime import date
from itertools import product

# generate a dataframe like yours
date = [date(2020, m, d).strftime("%B %d") for m, d in product(range(1, 13, 1), range(1, 29, 1))]
value = np.abs(np.random.randn(len(date)))
data = pd.DataFrame({'date': date,
                     'value': value})
data.set_index('date', inplace = True)

# convert index from str to date
data.index = pd.to_datetime(data.index, format = '%B %d')

# plot
fig, ax = plt.subplots(1, 1, figsize = (16, 8))
ax.bar(data.index,
       data['value'])

# formatting xaxis
ax.xaxis.set_major_locator(md.DayLocator(interval = 5))
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90)
ax.set_xlim([data.index[0], data.index[-1]])

plt.show()

that gives me this plot:

enter image description here

I converted the index of the dataframe from string to date, then I applied the xaxis format that I want through ax.xaxis.set_major_locator and ax.xaxis.set_major_formatter methods.
In order to plot that I used matplotlib, but it should not be difficult to translate this approach to pylab.


EDIT

If you want days and months of separate ticks, you can add a secondary axis (check this example) as in this code:

# import section
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
from datetime import date
from itertools import product
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA

# generate a dataframe like yours
date = [date(2020, m, d).strftime("%B %d") for m, d in product(range(1, 13, 1), range(1, 29, 1))]
value = np.abs(np.random.randn(len(date)))
data = pd.DataFrame({'date': date,
                     'value': value})
data.set_index('date', inplace = True)

# convert index from str to date
data.index = pd.to_datetime(data.index, format = '%B %d')

# prepare days and months axes
fig = plt.figure(figsize = (16, 8))
days = host_subplot(111, axes_class = AA.Axes, figure = fig)
plt.subplots_adjust(bottom = 0.1)
months = days.twiny()

# position months axis
offset = -20
new_fixed_axis = months.get_grid_helper().new_fixed_axis
months.axis['bottom'] = new_fixed_axis(loc = 'bottom',
                                       axes = months,
                                       offset = (0, offset))
months.axis['bottom'].toggle(all = True)

#plot
days.bar(data.index, data['value'])

# formatting days axis
days.xaxis.set_major_locator(md.DayLocator(interval = 10))
days.xaxis.set_major_formatter(md.DateFormatter('%d'))
plt.setp(days.xaxis.get_majorticklabels(), rotation = 0)
days.set_xlim([data.index[0], data.index[-1]])

# formatting months axis
months.xaxis.set_major_locator(md.MonthLocator())
months.xaxis.set_major_formatter(md.DateFormatter('%b'))
months.set_xlim([data.index[0], data.index[-1]])

plt.show()

which produces this plot:

enter image description here

Zephyr
  • 11,891
  • 53
  • 45
  • 80
  • thanks for your answer -- it seems like the pandas dependency isn't really doing much here, is there a reason you included it? – Jason Sundram Jun 15 '20 at 19:18
  • 2
    I used `itertools`, `numpy` and `pandas` only to build a `data` like yours, if you use different data type, it is fine. They play is done by `ax.xaxis` methods, does it work for you? – Zephyr Jun 15 '20 at 19:26
  • One thing I'd like to do is remove the repeated month labels and show "Jan Feb Mar Apr ... Dec" on the bottom of the x-axis and have a separate set of labels with the numbers (every 5 days as you have done is great) -- do you know how to accomplish that? – Jason Sundram Jun 15 '20 at 19:41
  • 2
    Thats a nice answer. You may also want to look at concise date formatter https://matplotlib.org/3.1.0/gallery/ticks_and_spines/date_concise_formatter.html which puts the month name at the first day of the month – Jody Klymak Jun 16 '20 at 05:39
  • It seems like you had two key insights here: 1) use `twiny` to create a second x axis, and 2) use a good understanding of matplotlib internals to move the x-axis to the bottom of the chart. If you can explain how part 2) works, that would be great. I did something inspired by your code, but using a different approach to 2) -- it also feels a bit verbose / heavy -- see the gist below. Is there a simpler way? https://gist.github.com/jsundram/ef0543b7c86128faba2c81b887459eaf – Jason Sundram Jun 17 '20 at 23:23
2

Modifying the accepted answer just a bit gives:

locator = md.MonthLocator(bymonthday=(1, 15))
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(md.ConciseDateFormatter(locator))
#plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90 )
ax.set_xlim([data.index[0], data.index[-1]])

plt.show()

enter image description here

Jody Klymak
  • 4,979
  • 2
  • 15
  • 31