1

How can I group data into months from dates where a data frame has both categorical and numerical data in pandas. I tried the groupby function but I think it won't work with categorical data. There are multiple values in the categorical column. Sample data:

Date Campaign_Name No_of_Male_Viewers No_of_Female_Viewers
2021-06-12 Dove_birds 1268 7656
2021-02-05 Pantene_winner 657 8964
2021-09-15 Budweiser_wazap 7642 76
2021-05-13 Pantene_winner 425 6578
2021-12-12 Budweiser_wazap 9867 111
2021-09-09 Dove_birds 1578 11456
2021-05-24 Pantene_winner 678 7475
2021-09-27 Budweiser_wazap 8742 96
2021-09-09 Dove_soft 1175 15486

Now I need to group the data months wise and show for example that Budweiser_wazap in September gained a total audience of xxxx and in December gained xxxx audience and so on for the other campaigns as well.

Expected output sample:

Month Campaign_Name No_of_Male_Viewers No_of_Female_Viewers
February Pantene_winner 657 8964
September Budweiser_wazap 16384 172

Since Budweiser_wazap campaign ran twice in September, the resulting output for No_of_Male_Viewers is: 7642 + 8742 = 16384, and for No_of_Female_Viewers is: 76 + 96 = 172.

SSRN
  • 35
  • 4
  • groupby itself just segments the data - there is no restriction on what the column types are. What is the expected output? – anon01 Aug 29 '22 at 05:46
  • Added it in my question. In September Budweiser aired twice but it aggregates the viewership. – SSRN Aug 29 '22 at 05:56
  • please provide actual values. That is the only way to distinguish which of the many similar operations you are actually interested in – anon01 Aug 29 '22 at 06:14
  • I've updated my question, I think this might clarify what I need to accomplish – SSRN Aug 29 '22 at 12:59
  • note that you will want to alos group by `year` if rows with 05/2020 and 05/2021 should be treated/summed separately – anon01 Aug 29 '22 at 15:40

2 Answers2

3

USE-

##Get Month Name for each date
df['Month'] = df['Date'].dt.month_name()

#Groupby `Month` & `Campaign_Name`
df.groupby(['Month', 'Campaign_Name'])[['No_of_Male_viewers', 'No_of_Female_viewers']].sum().reset_index()
df

Sample Reproducible code-

import pandas as pd
import numpy as np
from pandas import DataFrame

df = pd.DataFrame({
    'Date' : ['2015-06-08', '2015-08-05', '2015-05-06', '2015-05-05', '2015-07-08', '2015-05-07', '2015-06-05', '2015-07-05'], 
    'Sym'  : ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 
    'Data2': [11, 8, 10, 15, 110, 60, 100, 40],
    'Data3': [5, 8, 6, 1, 50, 100, 60, 120]
})
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month_name()
df

df output-

Date           Sym  Data2   Data3  Month
0   2015-06-08  aapl    11  5   June
1   2015-08-05  aapl    8   8   August
2   2015-05-06  aapl    10  6   May
3   2015-05-05  aapl    15  1   May
4   2015-07-08  aaww    110 50  July
5   2015-05-07  aaww    60  100 May
6   2015-06-05  aaww    100 60  June
7   2015-07-05  aaww    40  120 July

Groupby Condition-

df.groupby(['Month', 'Sym'])[['Data2', 'Data3']].sum().reset_index()

Output-

    Month   Sym   Data2 Data3
0   August  aapl    8   8
1   July    aaww    150 170
2   June    aapl    11  5
3   June    aaww    100 60
4   May     aapl    25  7
5   May     aaww    60  100

Ref link- Pandas - dataframe groupby - how to get sum of multiple columns

Divyank
  • 811
  • 2
  • 10
  • 26
0

If you use strftime('%B') - that automatically extracts Month names - you can reach the same result with one line of code :)

#download dataframe from Stackoverflw and convert column to datetime
df=pd.read_clipboard()
df['Date']=pd.to_datetime(df['Date'], infer_datetime_format=True)

#'%B' - will return full month name, '%b' - 3-Letter- like Dec, Sep
df.groupby([df['Date'].dt.strftime('%B'), "Campaign_Name"]).sum()

Here is how the output looks like: enter image description here