3

I have the following data frame which I want to apply bfill as follows:

'amount' 'percentage'
Nan
1.0 20
2.0 10
Nan
Nan
Nan
Nan
3.0 50
4.0 10
Nan
5.0 10

I want to bfill Nan in the amount column as per percentage in the percentage column i.e., if the corresponding percentage is 50 then fill 50% of Nan before the number (partial fill). e.g. amount with 3.0 value have a percentage of 50 so out of 4 Nan entries, only 50% are to be bfill.

proposed output:

'amount' 'percentage'
Nan
1.0 20
2.0 10
Nan
Nan
3.0
3.0
3.0 50
4.0 10
Nan
5.0 10

Please help.

  • Share the code that you have developed and the errors that you are getting when you run that code, anyone will do the work that you are suposed to do. – allexiusw Sep 10 '21 at 04:39

1 Answers1

4

Create groups according to NaNs

df['group_id'] = df.amount.where(df.amount.isna(), 1).cumsum().bfill()

Create a filling function

def custom_fill(x):

    # Calculate number of rows to be filled
    max_fill_rows = math.floor(x.iloc[-1, 1] * (x.shape[0] - 1) / 100)

    # Fill only if number of rows to fill is not zero
    return x.bfill(limit=max_fill_rows) if max_fill_rows else x

Fill the DataFrame

df.groupby('group_id').apply(custom_fill)

Output

   amount  percentage group_id
0     NaN         NaN      1.0
1     1.0        20.0      1.0
2     2.0        10.0      2.0
3     NaN         NaN      3.0
4     NaN         NaN      3.0
5     3.0        50.0      3.0
6     3.0        50.0      3.0
7     3.0        50.0      3.0
8     4.0        10.0      4.0
9     NaN         NaN      5.0
10    5.0        10.0      5.0

PS: Don't forget to import the required libraries

import math
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
  • Thanks but bfill not working as it should have. (No Nan in amount are filled) – Mayur Zambare Sep 10 '21 at 07:10
  • @MayurZambare -This solution does work with your example (or similar) data. Only limitation is that the `percantage` column has to be the second in your dataframe. You can change `x.iloc[-1, 1]` to `x.iloc[-1].loc['percentage']` to work with different column layouts. – Michael Szczesny Sep 10 '21 at 07:23
  • Yeah. Please customize according to your data. I can only guide you. @MayurZambare. Please follow Michael's solution. – Vishnudev Krishnadas Sep 10 '21 at 07:53
  • @Vishnudev I have traied that but still Nan are not filled. I am using Pandas version 1.2.4. – Mayur Zambare Sep 10 '21 at 15:00
  • @MayurZambare You need to do `df['amount'] = df.amount.replace('Nan', np.nan)` before all this. Since the `NaN` in your column is a string Nan. – Vishnudev Krishnadas Sep 12 '21 at 08:27