5

Is there a way (more efficient than using a for loop) to replace all the nulls in a Pandas' DataFrame with the max value in its respective row.

rhaskett
  • 1,864
  • 3
  • 29
  • 48
  • What do you mean by "more efficient"? The time complexity of what you are trying to do cannot be improved from the basic implementation (loop through each row, compute max, fill nulls with max), as you need to look at every element at least once. – James Jul 29 '15 at 17:40
  • 1
    Generally with Pandas you can perform operations on the full frame at once using internal optimized functions that are faster than looping through the frame yourself. For instance, df.mul(df2) is faster than looping through the frames simultaneously and doing the multiplication in python. Similar to how numpy works. – rhaskett Jul 29 '15 at 18:02

1 Answers1

5

I guess that is what you are looking for:

import pandas as pd  

df = pd.DataFrame({'a': [1, 2, 0], 'b': [3, 0, 10], 'c':[0, 5, 34]})


   a   b   c
0  1   3   0
1  2   0   5
2  0  10  34

You can use apply, iterate over all rows and replace 0 by the maximal number of the row by using the replace function which gives you the expected output:

df.apply(lambda row: row.replace(0, max(row)), axis=1)

    a   b   c
0   1   3   3
1   2   5   5
2  34  10  34

If you want to to replace NaN - which seemed to be your actual goal according to your comment - you can use

df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})

     a     b     c
0  1.0   3.0   NaN
1  2.0   NaN   5.0
2  NaN  10.0  34.0

df.T.fillna(df.max(axis=1)).T

yielding

      a     b     c
0   1.0   3.0   3.0
1   2.0   5.0   5.0
2  34.0  10.0  34.0

which might be more efficient (have not done the timings) than

df.apply(lambda row: row.fillna(row.max()), axis=1)

Please note that

df.apply(lambda row: row.fillna(max(row)), axis=1)

does not work in each case as explained here.

Cleb
  • 25,102
  • 20
  • 116
  • 151