I need to generate an array that represents monthly profits like this:
[
[2008, None, None, None, 100, 100, 100, 100, 100, 100, 100, 100, 100],
[2009, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
# ecc...
[2019, 100, 100, 100, 100, 100, 100, 100, 100, None, None, None, None, None]
]
In other words: year, sum of profits for month 1, sum of profits for month 2, ecc
with None
where there is no info for that month.
from a dataframe of daily profits like this
date
2008-04-01 0.0
2008-04-02 10.0
2008-04-03 10.0
2008-04-04 10.0
2008-04-05 10.0
Name: profit, dtype: float64
I call df["profit"].groupby(pd.Grouper(freq='M')).sum()
and get:
date
2008-04-30 100.0
2008-05-31 100.0
2008-06-30 100.0
2008-07-31 100.0
2008-08-31 100.0
Freq: M, Name: profit, dtype: float64
Now I was thinking to do something like this, in pseudo-code that doesn't work:
start = df["date"].min().to_pydatetime()
end = df["date"].max().to_pydatetime()
result = [
[start.year]
]
idx = 0
for date, monthly_profit in df["profit"].groupby(pd.Grouper(freq='M')).sum().iterrows():
if date.year !== result[idx][0]:
idx += 1
result[idx] = [date.year]
month = 1
while month <= 12:
if date.month == month:
result[idx].append(monthly_profit)
else:
result[idx].append(None)
month += 1
Is there anyway to do this without iterating? If not, how can I iterate and read the date value?
EDIT, in response to QuickBeam's answer, here's my solution to avoid the problem when not all 12 months are present in the original data:
if len(df.groupby([df.date.dt.month]).agg({"date": "count"})) < 12:
# All months should be represented by at least one data point for the display to work properly
# If not all months are present, we insert empty data
min_year = df["date"].min().to_pydatetime().year
for m in range(1, 13):
if df.loc[df.date.dt.month == m].empty:
df = df.append(pd.DataFrame({"date": datetime(min_year, m, 1), column_name: [np.nan]}))
else:
min_year = df.loc[df.date.dt.month == m]["date"].iloc[0].to_pydatetime().year