3

I know about bfill and ffill to fill values in rows of the same column. But how do you do it when you need to fill values across certain multiple columns in a dataframe?

Here's the example:

Initial df:

import pandas as pd
inidf = [('Prod', ['P1', 'P2']),
 ('A', ['1', '1']),
 ('1', ['', '40']),
 ('2', ['10', '60']),
 ('3', ['30', '']),
 ('B', ['1', '2']),             
 ]
df = pd.DataFrame.from_items(inidf)
df

  Prod  A   1   2   3  B
0   P1  1      10  30  1
1   P2  1  40  60      2

Target df:

tgtdf = [('Prod', ['P1', 'P2']),
 ('A', ['1', '1']),
 ('1', ['10', '40']),
 ('2', ['10', '60']),
 ('3', ['30', '60']),
 ('B', ['1', '2']),             
 ]
df2 = pd.DataFrame.from_items(tgtdf)
df2

  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

In my example above, the columns to be targeted are Columns named 1, 2 and 3. In the first row, the first target column (named 1) has a missing value and is copied from the next populated Column in this case (named 2). In the second row, last target column (named 3) has a missing value and is copied from the previous populated Column in this case (named 2).

Zero
  • 74,117
  • 18
  • 147
  • 154
Jdoe
  • 111
  • 2
  • 10
  • is necessary find rows for `fill` and for `bfill` ? Is possible if no value in `2` column sometimes is replace by `bfill` and sometimes by `ffill`? There is multiple columns? – jezrael Aug 13 '17 at 16:08

3 Answers3

3

You can use replace first for convert empty spaces to NaNs.

Then select rows for bfill and for ffill replacing with axis=1 for replace by rows:

df = df.replace('', np.nan)
bfill_rows = [0] #if necessary specify more values of index
ffill_rows = [1] #if necessary specify more values of index

df.loc[bfill_rows] = df.loc[bfill_rows].bfill(axis=1)
df.loc[ffill_rows] = df.loc[ffill_rows].ffill(axis=1)
print (df)
  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2

If necessary is possible specify columns also:

df = df.replace('', np.nan)
cols = ['1','2','3']
bfill_rows = [0]
ffill_rows = [1]

df.loc[bfill_rows, cols] = df.loc[bfill_rows, cols].bfill(axis=1)
df.loc[ffill_rows, cols] = df.loc[ffill_rows, cols].ffill(axis=1)
print (df)

  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Replace all blanks with NaNs and first ffill and then bfill on axis=1 for columns '1','2','3'

In [31]: df[['1','2','3']] = df[['1','2','3']].replace('', np.nan).ffill(1).bfill(1)

In [32]: df
Out[32]:
  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2
Zero
  • 74,117
  • 18
  • 147
  • 154
1

First of all, replace the empty quotes with NaN values. Then ffill or bfill as needed, specifying axis=0. The axis is 0 when selecting a given row because the result of such a selection is a series. If you were to select multiple rows (e.g. the entire dataframe), then the axis would be 1

df = df.replace('', np.nan)
df.iloc[0, :].bfill(axis=0, inplace=True)  # Backfill first row.
df.iloc[1, :].ffill(axis=0, inplace=True)  # Forwardfill second row.

>>> df
  Prod  A   1   2   3  B
0   P1  1  10  10  30  1
1   P2  1  40  60  60  2
Alexander
  • 105,104
  • 32
  • 201
  • 196