Combine data from two columns into one, except if second is already occupied in pandas

Question

Say I have two columns in a data frame, one of which is incomplete.

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b':[5, '', 6, '']})

df
Out: 
      a   b
0     1   5
1     2   
2     3   6
3     4

is there a way to fill the empty values in column b with the corresponding values in column a whilst leaving the rest of column b intact? such that you obtain without iterating over the column?

df
Out: 
      a   b
0     1   5
1     2   2
2     3   6
3     4   4

I think you can use the apply method - but I am not sure. For reference the dataset I'm dealing with is quite large (appx 1GB) which is why iteration - my first attempt was not a good idea.

Are you sure, if the blanks are strings `''` or 'NaNs' in dataframe? — Zero, Jun 26 '17 at 19:58

score 2 · Answer 1 · answered Jun 26 '17 at 20:01

If blanks are empty strings, you could

In [165]: df.loc[df['b'] == '', 'b'] = df['a']

In [166]: df
Out[166]:
   a  b
0  1  5
1  2  2
2  3  6
3  4  4

However, if your blanks are NaNs, you could use fillna

In [176]: df
Out[176]:
   a    b
0  1  5.0
1  2  NaN
2  3  6.0
3  4  NaN

In [177]: df['b'] = df['b'].fillna(df['a'])

In [178]: df
Out[178]:
   a    b
0  1  5.0
1  2  2.0
2  3  6.0
3  4  4.0

score 2 · Answer 2 · answered Jun 26 '17 at 23:17

2

You can use np.where to evaluate df.b, if it's not empty keep its value, otherwise use df.a instead.

df.b=np.where(df.b,df.b,df.a)

df
Out[33]: 
   a  b
0  1  5
1  2  2
2  3  6
3  4  4

answered Jun 26 '17 at 23:17

Allen Qin

19,507
8
51
67

score 1 · Answer 3 · answered Jun 26 '17 at 21:15

1

You can use pd.Series.where using a boolean version of df.b because '' resolve to False

df.assign(b=df.b.where(df.b.astype(bool), df.a))

   a  b
0  1  5
1  2  2
2  3  6
3  4  4

answered Jun 26 '17 at 21:15

piRSquared

285,575
57
475
624

score 0 · Answer 4 · answered Jun 26 '17 at 20:21

0

You can use replace and ffill with axis=1:

df.replace('',np.nan).ffill(axis=1).astype(df.a.dtypes)

Output:

answered Jun 26 '17 at 20:21

Scott Boston

147,308
15
139
187

Combine data from two columns into one, except if second is already occupied in pandas

4 Answers4