I want to understand how covid pandemic is affecting the supply chain industry such as meat processing plants. I retrieved NYT covid data by county level and statistical data from food agency, so I want to understand how covid cases are surging in counties where major food processing plants are located. To do so, I figured out the right data and able to make it ready for rendering a nice time series chart. However, I found issues of getting the right plotting data for that because the resulted plot is not getting the expected output. Here is what I tried so far:
my attempt:
Here is the final aggregated covid time series data that I am interested in this gist. Here is my current attempt:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import timedelta, datetime
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/7eb2dd4ac75034fcb50ff5549f2e5e21/raw/477c07446a8715f043c9b1ba703a03b2f913bdbf/covid_tsdf.csv")
df.drop(['Unnamed: 0', 'fips', 'non-fed-slaughter', 'fed-slaughter', 'total-slaughter', 'mcd-asl'], axis=1, inplace=True)
for ct in df['county_state'].unique():
dd = df.groupby([ct, 'date', 'est'])['num-emp'].sum().unstack().reset_index()
p = sns.lineplot('date', 'values', data=dd, hue='packer', markers=markers, style='cats', ax=axes[j, 0])
p.set_xlim(data.date.min() - timedelta(days=60), data.date.max() + timedelta(days=60))
plt.legend(bbox_to_anchor=(1.04, 0.5), loc="center left", borderaxespad=0)
but looks I made the wrong aggregation above, this attempt is not working. My intention is basically if a company has multiple establishments (a.k.a est
), then I need to take sum of its num-emp
: # of employees, then get the ratio of # of new_deaths / num-emp
along the time. Basically I want to track whether company's staff are affected by covid or not with some approximate sense. I am not quite sure what would be the correct way of doing this with matplotlib
in python. Can anyone suggest possible of correction to make this right? Any idea?
second attempt
I got some inspiration from recent covid19 related post, so this is another way of trying to do what I want to make in matplotlib
. I aggregated data in this way with custom plotting helper function also:
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/7eb2dd4ac75034fcb50ff5549f2e5e21/raw/477c07446a8715f043c9b1ba703a03b2f913bdbf/covid_tsdf.csv")
ds_states = df.groupby('county_state').sum().rename({'county_state': 'location'})
ds_states['mortality'] = ds_states['deaths'] / ds_states['popestimate2019'] * 1_000_000
ds_states['daily_mortality'] = ds_states['new_deaths'] / ds_states['popestimate2019'] * 1_000_000
ds_states['daily_mortality7'] = ds_states['daily_mortality'].rolling({'time': 7}).mean()
then this is plotting helper function that I came up:
def subplots(*args, tick_right=True, **kwargs):
f, ax = plt.subplots(*args, **kwargs)
if tick_right:
ax.yaxis.tick_right()
ax.yaxis.set_label_position("right")
ax.yaxis.grid(color="lightgrey", linewidth=0.5)
ax.xaxis.grid(color="lightgrey", linewidth=0.5)
ax.xaxis.set_tick_params(labelsize=14)
return f, ax
_, ax1 = subplots(subplot_kw={'xlim': XLIM})
ax1.set(title=f'US covid tracking in meat processing plants by county - Linear scale')
ax2 = ax1.twinx()
but I trapped again here how to make this right. My essential goal is basically whether how much meat processing companies are affected by covid because if its worker got infected by covid, companies' performance will be dropped. I want to make eda that provides this sort of information visually. Can anyone suggest possible ways of doing this with matplotlib
? I am open to any feasible eda attempt that makes this question more realistic or meaningful.
desired output
I am thinking about to make eda output something like below:
what I want to see, by county level, how every company's performance is varied because of covid. Can anyone point me out anyway to achieve possible eda output? Thanks
update
since what kind od eda that I want to make is not quite solid in my mind, so I am open to hearing any possible eda that fit the context of the problem that I raised above. Thanks in advance!