I have a dataframe where the index is a datetimeindex
, and every row is every day over the course of a couple years. I need to resample the dataframe by month where the two float columns are summed, but the string columns are all the unique values during that month. I can do the resampling to a single column, but I don't know how to do it to everything, or how to combine them back together if I do it one at a time.
For the floats I was trying:
# go through the column list
for col in col_list:
# process all run time columns for month
if "float" in str(col):
# resample for one month and sum
df[col] = df[col].resample('M').sum()
# rename the column
df.rename(columns={col: col + " MONTHLY"}, inplace=True)
and for the strings:
elif "string" in str(col):
# get all the unique jobs run during the month
df[col] = df[col].groupby(pd.Grouper(freq='M')).unique()
df.rename(columns={col: col + " MONTHLY"}, inplace=True)
these were resulting in the monthly data being inserted into tho the dataframe with every day still existing though, and was hard to find and not what I need.
Some sample data:
float_1 float_2 string_1 string_2
12/30/2019 1 2 a a
12/31/2019 1 3 a b
1/1/2020 2 4 a c
1/2/2020 3 5 b d
The expected output would be:
12/2019 2 5 a a, b
1/2020 5 9 a, b c, d
Not sure if it matters but the real data does have NaN in random days throughout the data.