4

I need to group the data by websites and get the average of views for the specific range of dates. My data looks like this:

date        website         amount_views
1/1/2021        a               23
1/2/2021        a               17
1/3/2021        a               10
1/4/2021        a               25
1/5/2021        a               2
1/1/2021        b               12
1/2/2021        b               7
1/3/2021        b               5
1/4/2021        b               17
1/5/2021        b               2

So I need to see what is the average for a and b websites for two ranges of date (1/1/2021 - 1/3/2021 (pre) AND 1/3/2021 - 1/5/2021 (post)) The desired output is:

date        website         avg_amount_views
pre            a                 31.5
post           a                 35.6
pre            b                 15.5
post           b                 22.6
Chique_Code
  • 1,422
  • 3
  • 23
  • 49

4 Answers4

6

You can use np.where and date.between to allocate the pre and post status and group by the same and websites and find mean.

In one line(though not so readable):

  df['date']=pd.to_datetime(df['date'])
  df.groupby([np.where(df['date'].between('1/1/2021','1/3/2021'),'pre'\
  ,'post'),'website'])['amount_views'].mean().to_frame('mean')

Step by step (more readable):

df['date']=pd.to_datetime(df['date'])
df['status']=np.where(df['date'].between('1/1/2021','1/3/2021'),'pre','post')
df.groupby(['status','website'])['amount_views'].mean().to_frame('mean')

                     mean
status website           
post   a        13.500000
       b         9.500000
pre    a        16.666667
       b         8.000000
wwnde
  • 26,119
  • 6
  • 18
  • 32
4
  • Use pandas.Grouper and specify the freq parameter as 'W', for weekly.
import pandas as pd

# test dataframe
data = {'date': ['1/1/2021', '1/2/2021', '1/3/2021', '1/4/2021', '1/5/2021', '1/1/2021', '1/2/2021', '1/3/2021', '1/4/2021', '1/5/2021'], 'website': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'], 'amount_views': [23, 17, 10, 25, 2, 12, 7, 5, 17, 2]}

df = pd.DataFrame(data)

# set the date column to a datetime format - required
df.date = pd.to_datetime(df.date)

# groupby with pd.Grouper
mean_visits = df.groupby([pd.Grouper(key='date', freq='W'), 'website'])['amount_views'].mean().reset_index(name='mean_visits')

# display(mean_visits)
        date website  mean_visits
0 2021-01-03       a    16.666667
1 2021-01-03       b     8.000000
2 2021-01-10       a    13.500000
3 2021-01-10       b     9.500000
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
4

Use:

dates = pd.to_datetime(df['date'])
new_df = (df.groupby(['website', np.select((dates.between('1/1/2021', '1/3/2021'), 
                                           dates.between('1/3/2021', '1/5/2021')), 
                                           ('pre', 'pos'))])
            .amount_views
            .mean()
            .rename_axis(('website', 'date'))
            .reset_index(name='avg_amount_views'))
print(new_df)

  website date  avg_amount_views
0       a  pos         13.500000
1       a  pre         16.666667
2       b  pos          9.500000
3       b  pre          8.000000
ansev
  • 30,322
  • 5
  • 17
  • 31
4

You can use pd.cut to define 'pre' and 'post':

grp = pd.cut(df['date'], bins=[pd.Timestamp(2021, 1, 1), 
                               pd.Timestamp(2021, 1, 3), 
                               pd.Timestamp(2021, 1, 6)], labels=['pre', 'post'],
      right=False)

df.groupby([grp, 'website'])['amount_views'].agg(['mean','count']).reset_index()

Output:

   date website       mean  count
0   pre       a  20.000000      2
1   pre       b   9.500000      2
2  post       a  12.333333      3
3  post       b   8.000000      3
Scott Boston
  • 147,308
  • 15
  • 139
  • 187