3

I have a dataset that can be crafted in this way:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

date_range = pd.date_range(start='2021-11-20', end='2022-01-09').to_list()
df_left = pd.DataFrame(columns=['Date','Values'])
for d in date_range*3:
    if (np.random.randint(0,2) == 0):
        df_left = df_left.append({'Date': d, 'Values': np.random.randint(1,11)}, ignore_index=True)
df_left["year-week"] = df_left["Date"].dt.strftime("%Y-%U")

df_right = pd.DataFrame(
    {
        "Date": date_range,
        "Values": np.random.randint(0, 50 , len(date_range)),
    }
)

df_right_counted = df_right.resample('W', on='Date')['Values'].sum().to_frame().reset_index()
df_right_counted["year-week"] = df_right_counted["Date"].dt.strftime("%Y-%U")

pd_right_counted:

        Date  Values year-week
0 2021-12-05     135   2021-49
1 2021-12-12     219   2021-50
2 2021-12-19     136   2021-51
3 2021-12-26     158   2021-52
4 2022-01-02     123   2022-01
5 2022-01-09     222   2022-02

And pd_left:

         Date Values year-week
0  2021-12-01     10   2021-48
1  2021-12-05      1   2021-49
2  2021-12-07      5   2021-49
...
13 2022-01-07      7   2022-01
14 2022-01-08      9   2022-01
15 2022-01-09      6   2022-02

And I'd like to create this graph in matplotlib. enter image description here

Where a boxplot is plotted with df_left and it uses the y-axis on the left and a normal line plot is plotted with df_right_counted and uses the y-axis on the right.

This is my attempt (+ the Fix from the comment of Javier) so far but I am completely stuck with:

  • making both of the graphs starting from the same week ( I'd like to start from 2021-49 )
  • Plot another x-axis on the right and Let the line plot use it

This is my attempt so far:

fig, ax = plt.subplots(nrows=1, ncols=1, dpi=100)
fig.tight_layout()
fig.set_tight_layout(True)
fig.set_facecolor('white')

ax2=ax.twinx()

df_left.boxplot(figsize=(31, 8), column='Values', by='year-week', ax=ax)
df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)
plt.show()

enter image description here Could you give me some guidance? I am still learning using matplotlib

Imperial A
  • 57
  • 5
  • 2
    Create a [twin](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.twinx.html) axis by `ax2=ax.twinx()` and then e.g. indicate in the second plot that you use this twin axis: `df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)`. Hope this helps. – Javier TG Jan 09 '22 at 16:10
  • Thanks, it helped! This completely fixes the problem of creating the new y-axis! Now I am trying to understand why the line plot starts before the boxplot. I'll update the question right now. – Imperial A Jan 09 '22 at 16:21

1 Answers1

2

One of the problems is that resample('W', on='Date') and .dt.strftime("%Y-%U") seem to lead to different numbers in both dataframes. Another problem is that boxplot internally labels the boxes starting with 1.

Some possible workarounds:

  • oblige boxplot to number starting from one
  • create the counts via first extracting the year-week and then use group_by; that way the week numbers should be consistent
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

date_range = pd.date_range(start='2021-11-20', end='2022-01-09').to_list()
df_left = pd.DataFrame(columns=['Date', 'Values'])
for d in date_range * 3:
    if (np.random.randint(0, 2) == 0):
        df_left = df_left.append({'Date': d, 'Values': np.random.randint(1, 11)}, ignore_index=True)
df_left["year-week"] = df_left["Date"].dt.strftime("%Y-%U")

df_right = pd.DataFrame({"Date": date_range,
                         "Values": np.random.randint(0, 50, len(date_range))})
df_right["year-week"] = df_right["Date"].dt.strftime("%Y-%U")

df_right_counted = df_right.groupby('year-week')['Values'].sum().to_frame().reset_index()

fig, ax = plt.subplots(nrows=1, ncols=1, dpi=100)
fig.tight_layout()
fig.set_tight_layout(True)
fig.set_facecolor('white')

ax2 = ax.twinx()

df_left.boxplot(figsize=(31, 8), column='Values', by='year-week', ax=ax,
                positions=np.arange(len(df_left['year-week'].unique())))
df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)
plt.show()

combining boxplot and lineplot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Really thanks! I didn't know about boxplot labelling starting with one. Now my graphs start to have some sense, I only need to polish it. Also, for information I had some incosistence in the line plot near last week of December and first of January, so I changed `%U` with `%V` as suggested [here](https://stackoverflow.com/questions/2600775/how-to-get-week-number-in-python) – Imperial A Jan 09 '22 at 21:01