10

The goal is to maintain the relationship between two columns by setting to NaN all the values from one column in another column.

Having the following data frame:

df = pd.DataFrame({'a': [np.nan, 2, np.nan, 4],'b': [11, 12 , 13, 14]})

     a   b
0  NaN  11
1    2  12
2  NaN  13
3    4  14

Maintaining the relationship from column a to column b, where all NaN values are updated results in:

     a    b
0  NaN  NaN
1    2   12
2  NaN  NaN
3    4   14

One way that it is possible to achieve the desired behaviour is:

df.b.where(~df.a.isnull(), np.nan)

Is there any other way to maintain such a relationship?

Krzysztof Słowiński
  • 6,239
  • 8
  • 44
  • 62

5 Answers5

9

You could use mask on NaN rows.

In [366]: df.mask(df.a.isnull())
Out[366]:
     a     b
0  NaN   NaN
1  2.0  12.0
2  NaN   NaN
3  4.0  14.0

For, presence of any NaN across columns use df.mask(df.isnull().any(1))

Zero
  • 74,117
  • 18
  • 147
  • 154
3

Using pd.Series.notnull to avoid having to take the negative of your Boolean series:

df.b.where(df.a.notnull(), np.nan)

But, really, there's nothing wrong with your existing solution.

jpp
  • 159,742
  • 34
  • 281
  • 339
2

Another one would be:

df.loc[df.a.isnull(), 'b'] = df.a

Isn't shorter but does the job.

zipa
  • 27,316
  • 6
  • 40
  • 58
1

Using dropna with reindex

df.dropna().reindex(df.index)
Out[151]: 
     a     b
0  NaN   NaN
1  2.0  12.0
2  NaN   NaN
3  4.0  14.0
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Using np.where(),

df['b'] = np.where(df.a.isnull(), df.a, df.b)

Working - np.where(condition, [a, b])

Return elements, either from a or b, depending on condition.

Output:

>>> df
    a       b
0   NaN     NaN
1   2.0     12.0
2   NaN     NaN
3   4.0     14.0
Van Peer
  • 2,127
  • 2
  • 25
  • 35