1

Say I have two columns in a data frame, one of which is incomplete.

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b':[5, '', 6, '']})

df
Out: 
      a   b
0     1   5
1     2   
2     3   6
3     4

is there a way to fill the empty values in column b with the corresponding values in column a whilst leaving the rest of column b intact? such that you obtain without iterating over the column?

df
Out: 
      a   b
0     1   5
1     2   2
2     3   6
3     4   4

I think you can use the apply method - but I am not sure. For reference the dataset I'm dealing with is quite large (appx 1GB) which is why iteration - my first attempt was not a good idea.

Zero
  • 74,117
  • 18
  • 147
  • 154
skailasa
  • 121
  • 8

4 Answers4

2

If blanks are empty strings, you could

In [165]: df.loc[df['b'] == '', 'b'] = df['a']

In [166]: df
Out[166]:
   a  b
0  1  5
1  2  2
2  3  6
3  4  4

However, if your blanks are NaNs, you could use fillna

In [176]: df
Out[176]:
   a    b
0  1  5.0
1  2  NaN
2  3  6.0
3  4  NaN

In [177]: df['b'] = df['b'].fillna(df['a'])

In [178]: df
Out[178]:
   a    b
0  1  5.0
1  2  2.0
2  3  6.0
3  4  4.0
Zero
  • 74,117
  • 18
  • 147
  • 154
2

You can use np.where to evaluate df.b, if it's not empty keep its value, otherwise use df.a instead.

df.b=np.where(df.b,df.b,df.a)

df
Out[33]: 
   a  b
0  1  5
1  2  2
2  3  6
3  4  4
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
1

You can use pd.Series.where using a boolean version of df.b because '' resolve to False

df.assign(b=df.b.where(df.b.astype(bool), df.a))

   a  b
0  1  5
1  2  2
2  3  6
3  4  4
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

You can use replace and ffill with axis=1:

df.replace('',np.nan).ffill(axis=1).astype(df.a.dtypes)

Output:

   a  b
0  1  5
1  2  2
2  3  6
3  4  4
Scott Boston
  • 147,308
  • 15
  • 139
  • 187