I have the following dataframe grouped by datafile and I want to fillna(method ='bfill') only for those 'groups' that contain more than half of the data.
df.groupby('datafile').count()
datafile column1 column2 column3 column4
datafile1 5 5 3 4
datafile2 5 5 4 5
datafile3 5 5 5 5
datafile4 5 5 0 0
datafile5 5 5 1 1
As you can see in the df above, I'd like to fill those groups that contain most of the information but not those who has none or little information. So I was thinking in a condition something like fillna those who have more than half of the counts and don't fill the rest or those with less than half.
I'm struggling on how to set up my condition since it involves working with a result of a groupby and the original df.
Help is appreciated it.
example df:
index datafile column1 column2 column3 column4
0 datafile1 5 5 NaN 20
1 datafile1 6 6 NaN 21
2 datafile1 7 7 9 NaN
3 datafile1 8 8 10 23
4 datafile1 9 9 11 24
5 datafile2 3 3 2 7
6 datafile2 4 4 3 8
7 datafile2 5 5 4 9
8 datafile2 6 6 NaN 10
9 datafile2 7 7 6 11
10 datafile3 10 10 24 4
11 datafile3 11 11 25 5
12 datafile3 12 12 26 6
13 datafile3 13 13 27 7
14 datafile3 14 14 28 8
15 datafile4 4 4 NaN NaN
16 datafile4 5 5 NaN NaN
17 datafile4 6 6 NaN NaN
18 datafile4 7 7 NaN NaN
19 datafile4 8 8 NaN NaN
19 datafile4 9 9 NaN NaN
20 datafile5 7 7 1 3
21 datafile5 8 8 NaN NaN
22 datafile5 9 9 NaN NaN
23 datafile5 10 10 NaN NaN
24 datafile5 11 1 NaN NaN
expected output df:
index datafile column1 column2 column3 column4
0 datafile1 5 5 9 20
1 datafile1 6 6 9 21
2 datafile1 7 7 9 23
3 datafile1 8 8 10 23
4 datafile1 9 9 11 24
5 datafile2 3 3 2 7
6 datafile2 4 4 3 8
7 datafile2 5 5 4 9
8 datafile2 6 6 6 10
9 datafile2 7 7 6 11
10 datafile3 10 10 24 4
11 datafile3 11 11 25 5
12 datafile3 12 12 26 6
13 datafile3 13 13 27 7
14 datafile3 14 14 28 8
15 datafile4 4 4 NaN NaN
16 datafile4 5 5 NaN NaN
17 datafile4 6 6 NaN NaN
18 datafile4 7 7 NaN NaN
19 datafile4 8 8 NaN NaN
19 datafile4 9 9 NaN NaN
20 datafile5 7 7 1 3
21 datafile5 8 8 NaN NaN
22 datafile5 9 9 NaN NaN
23 datafile5 10 10 NaN NaN
24 datafile5 11 1 NaN NaN