I have a daily dataset that has a categorical and numerical column. So, I want to change the daily dataset to the monthly dataset. How can I do that using python? For example, if I have a dataset similar to the picture below how can I bring it in per month having a categorical value and sum for a numerical column. It was easy to sum for the numerical columns but having categorical makes it difficult. The categorical value is most likely to be the same throughout the daily data.
Asked
Active
Viewed 885 times
1

Bad Coder
- 177
- 11
-
You groupby the categorical column `df.groupby(["Name", pd.Grouper("Date", freq="MS")["Close"].ohlc()`. Include some sample data as text, not picture for better help – Code Different Jun 03 '22 at 01:12
-
Hi, Thank you for the response. I used your solution but I got an error. I replaced **.ohlc()** with **sum()** and I got an error. `--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_20122/3921144906.py in
----> 1 new_df=new_df.groupby(["Name", 'Customer_City', 'Test_Test',], pd.Grouper("Date", freq="MS"))["Close", "Open", "High", "Low", "Close"].sum() TypeError: __init__() got multiple values for argument 'freq'` – Bad Coder Jun 03 '22 at 02:49
1 Answers
2
import pandas as pd
from statistics import mode
df1_ohlc = df.set_index('Date').groupby('Name').resample('M').agg({'Open':'first','High':'max','Low':'min','Close','last':'Volume': 'sum'}).reset_index()
df2_mode = df.set_index('Date').groupby('Name').resample('M').agg({'Open':mode,'High':mode,'Low':mode,'Close',mode:'Volume': mode}).reset_index()

Christian Eslabon
- 685
- 4
- 8
-
Please include an explanation with your answer to help readers understand how this works, and solves the problem. You can click the edit button at the bottom of your answer to add an explanation. Additionally, you may want to have a read of [how to answer](https://stackoverflow.com/help/how-to-answer) – Freddy Mcloughlan Jun 03 '22 at 01:47
-
Thank you for the response. I really appreciate your help. I have updated my question with a solution. We can convert daily data into monthly. It is difficult for categorical variables. Suppose within a month there is a change in a categorical variable. For a group by column **Name**, there is a change in the **Test_Test** column, and in a month there is a 10 Alpha value and 20 Bravo value for Name **AAA**. Is there a way we can do mode instead of first and last? – Bad Coder Jun 03 '22 at 02:48
-
-
We cannot choose **Name** first because it replaces all the values for that column. **Name** contains the group of stock so we should do group by. Is there a way to do groupby? – Bad Coder Jun 03 '22 at 03:47
-
-
On this part `.agg({'Open':'first','High':'max','Low':'min','Close','last':'Volume':'sum'})` can we use **mode** inplace of max, min,last, and so on ? – Bad Coder Jun 03 '22 at 05:50