1

This is what my data looks like:

       month  total_mobile_subscription
0    1997-01                     414000
1    1997-02                     423000
2    1997-03                     431000
3    1997-04                     479000
4    1997-05                     510000
..       ...                        ...
279  2020-04                    9056300
280  2020-05                    8928800
281  2020-06                    8860000
282  2020-07                    8768500
283  2020-08                    8659000

[284 rows x 2 columns]

Basically, I'm trying to change this into a dataset sorted by year with the value being the mean average of total mobiles subscriptions for each year.

I am not sure what to do as I am still learning this.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Lim Ze Kai
  • 13
  • 2

2 Answers2

1

Convert values of month to datetimes and aggregate by years:

y = pd.to_datetime(df['month']).dt.year.rename('year')
df1 = df.groupby(y)['total_mobile_subscription'].mean().reset_index()

df['month'] = pd.to_datetime(df['month'])
df1 = (df.groupby(df['month'].dt.year.rename('year'))['total_mobile_subscription']
         .mean().reset_index())

Or aggregate by first 4 values of month column:

df2 = (df.groupby(df['month'].str[:4].rename('year'))['total_mobile_subscription']
         .mean().reset_index())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'year': ['1997-01', '1997-02', '1997-03', '1998-01', '1998-02', '1998-03'],
    'sale': [500, 1000, 1500, 2000, 1000, 400]
    })


for a in range(1,13):
    x = '-0'+ str(a)
    df['year'] = df['year'].str.replace(x, '')
    

df2 = (df.groupby(['year']).mean('sale'))
print(df2)

Elkhan
  • 360
  • 2
  • 15