2

I have the following dataframe, with the following values. I want to insert rows in order to have a row for every person (Toby, Jane, David), and for every month in 2020. If there is no value for x or y, then fill with 0.

    ID  Name    Date        x   y
0   001 Toby    2020-01-01  15  NaN
1   001 Toby    2020-02-01  12  7
2   001 Toby    2020-05-01  7   1
3   001 Toby    2020-07-01  NaN 1
4   002 Jane    2020-11-01  20  1
5   002 Jane    2020-12-01  21  10
6   003 David   2020-07-01  -3  2

The resulting dataframe should have 36 rows, 12, for each person.

ID  Name        Date        x   y
0   001 Toby    2020-01-01  15  0
1   001 Toby    2020-02-01  12  7
2   001 Toby    2020-03-01  0   0
3   001 Toby    2020-04-01  0   0
4   001 Toby    2020-05-01  7   1
5   001 Toby    2020-06-01  0   0
6   001 Toby    2020-07-01  0   1
7   001 Toby    2020-08-01  0   0
8   001 Toby    2020-09-01  0   0
9   001 Toby    2020-10-01  0   0
10  001 Toby    2020-11-01  0   0
11  001 Toby    2020-12-01  0   0
12  002 Jane    2020-01-01  0   0
13  002 Jane    2020-02-01  0   0
14  002 Jane    2020-03-01  0   0
15  002 Jane    2020-04-01  0   0
16  002 Jane    2020-05-01  0   0
17  002 Jane    2020-06-01  0   0
18  002 Jane    2020-07-01  0   0
19  002 Jane    2020-08-01  0   0
20  002 Jane    2020-09-01  0   0
21  002 Jane    2020-10-01  0   0
22  002 Jane    2020-11-01  20  1
23  002 Jane    2020-12-01  21  10
24  003 David   2020-01-01  0   0
25  003 David   2020-02-01  0   0
26  003 David   2020-03-01  0   0
27  003 David   2020-04-01  0   0
28  003 David   2020-05-01  0   0
29  003 David   2020-06-01  0   0
30  003 David   2020-07-01  -3  2
31  003 David   2020-08-01  0   0
32  003 David   2020-09-01  0   0
33  003 David   2020-10-01  0   0
34  003 David   2020-11-01  0   0
35  003 David   2020-12-01  0   0

I looked into reindex, and managed to make it work on a single series. But I haven't found a way to generate rows dynamically on a dataframe to then fill the missing values.

Any help would be appreciated.

Louis GRIMALDI
  • 101
  • 1
  • 12

2 Answers2

4

You can use reindex for the purpose:

# list of the desired dates
# make sure that it has the same type with `Date` in your data
# here I assume strings
dates = pd.Series([f'2020-{x}-01' for x in range(1,13)]), name='Date')

(df.set_index(['Date']).groupby(['ID','Name'])
   .apply(lambda x: x.drop(['ID', 'Name'],axis=1).reindex(dates).fillna(0))
   .reset_index()
)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Another method is a Cartesian product of dates and your names whilst merging with your original dataframe.

dates = pd.date_range(start='01-01-2020',end='12-01-2020',freq='MS')
dates = pd.DataFrame(dates,columns=['Date']).assign(key='key')

names = df[['Name','ID']].drop_duplicates()

df1 = pd.merge(names.assign(key='key'),dates,on='key',how='outer').drop('key',axis=1)

df2 = pd.merge(df,df1,how='right',on=['Date','Name','ID']).fillna(0)\
                                          .sort_values(['ID','Date'])

print(df2)

    ID   Name       Date     x     y
0    1   Toby 2020-01-01  15.0   0.0
1    1   Toby 2020-02-01  12.0   7.0
7    1   Toby 2020-03-01   0.0   0.0
8    1   Toby 2020-04-01   0.0   0.0
2    1   Toby 2020-05-01   7.0   1.0
9    1   Toby 2020-06-01   0.0   0.0
3    1   Toby 2020-07-01   0.0   1.0
10   1   Toby 2020-08-01   0.0   0.0
11   1   Toby 2020-09-01   0.0   0.0
12   1   Toby 2020-10-01   0.0   0.0
13   1   Toby 2020-11-01   0.0   0.0
14   1   Toby 2020-12-01   0.0   0.0
15   2   Jane 2020-01-01   0.0   0.0
16   2   Jane 2020-02-01   0.0   0.0
17   2   Jane 2020-03-01   0.0   0.0
18   2   Jane 2020-04-01   0.0   0.0
19   2   Jane 2020-05-01   0.0   0.0
20   2   Jane 2020-06-01   0.0   0.0
21   2   Jane 2020-07-01   0.0   0.0
22   2   Jane 2020-08-01   0.0   0.0
23   2   Jane 2020-09-01   0.0   0.0
24   2   Jane 2020-10-01   0.0   0.0
4    2   Jane 2020-11-01  20.0   1.0
5    2   Jane 2020-12-01  21.0  10.0
25   3  David 2020-01-01   0.0   0.0
26   3  David 2020-02-01   0.0   0.0
27   3  David 2020-03-01   0.0   0.0
28   3  David 2020-04-01   0.0   0.0
29   3  David 2020-05-01   0.0   0.0
30   3  David 2020-06-01   0.0   0.0
6    3  David 2020-07-01  -3.0   2.0
31   3  David 2020-08-01   0.0   0.0
32   3  David 2020-09-01   0.0   0.0
33   3  David 2020-10-01   0.0   0.0
34   3  David 2020-11-01   0.0   0.0
35   3  David 2020-12-01   0.0   0.0
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • 1
    That works ! Going through cartesian product was something I was looking at, but hoped not to use because of the numerous merges to do. But it works, thank you @Datanoice – Louis GRIMALDI May 20 '20 at 16:02
  • 1
    @LouisGRIMALDI merges are part and parcel of any data person's life :) – Umar.H May 20 '20 at 16:14