0

I wrote this code to draw the histogram of date values in each month. It shows the number of dates for each month in the whole dataset. But I want the histogram to be for each month in each year.That is, for example, I should have January through December for year1, and then January through December for year2 and so on.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

pd.options.display.mpl_style = 'default'

sns.set_context("talk")

df = pd.read_csv("data.csv", names=['lender','loan','country','sector','amount','date'],header=None)
date=df['date']
df.date = date.astype("datetime64")
df.groupby(df.date.dt.month).count().plot(kind="bar")
HimanAB
  • 2,443
  • 8
  • 29
  • 43
  • create a year column, a month column, then group by both. – Paul H Jul 06 '16 at 16:33
  • It says, I cannot groupby TWO columns. I did this: df.groupby(mydefinedMonth, myDefinedYear).count().plot(kind="bar") – HimanAB Jul 06 '16 at 16:36
  • you have to pass both columns as a single list per the docstring of `pandas.DataFrame.groupby` – Paul H Jul 06 '16 at 16:37
  • Sorry but your answer is not clear. – HimanAB Jul 06 '16 at 16:39
  • I made a tuple (year, month) and I passed this tuple to the groupby and now it works. But the labels (date values in x-axis) are outside of the picture. Is there a way I can make the labels smaller? – HimanAB Jul 06 '16 at 16:43

1 Answers1

3

According to the docstring the groupby docstring, the by parameter is:

list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

So your code simply becomes:

df = pd.read_csv(...)
df['date'] = df['date'].astype("datetime64")
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df.groupby(by=['month', 'year']).count().plot(kind="bar")

But I would write this as:

ax = (
    pandas.read_csv(...)
        .assign(date=lambda df: df['date'].astype("datetime64"))
        .assign(year=lambda df: df['date'].dt.year)
        .assign(month=lambda df: df['date'].dt.month)
        .groupby(by=['year', 'month'])
        .count()
        .plot(kind="bar")
)

And now you have a matplotlib axes object that you can use to modify the tick labels (e.g., matplotlib x-axis ticks dates formatting and locations)

Community
  • 1
  • 1
Paul H
  • 65,268
  • 20
  • 159
  • 136