Is there a way (more efficient than using a for loop) to replace all the nulls in a Pandas' DataFrame with the max value in its respective row.
Asked
Active
Viewed 5,833 times
5
-
What do you mean by "more efficient"? The time complexity of what you are trying to do cannot be improved from the basic implementation (loop through each row, compute max, fill nulls with max), as you need to look at every element at least once. – James Jul 29 '15 at 17:40
-
1Generally with Pandas you can perform operations on the full frame at once using internal optimized functions that are faster than looping through the frame yourself. For instance, df.mul(df2) is faster than looping through the frames simultaneously and doing the multiplication in python. Similar to how numpy works. – rhaskett Jul 29 '15 at 18:02
1 Answers
5
I guess that is what you are looking for:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 0], 'b': [3, 0, 10], 'c':[0, 5, 34]})
a b c
0 1 3 0
1 2 0 5
2 0 10 34
You can use apply
, iterate over all rows and replace 0 by the maximal number of the row by using the replace
function which gives you the expected output:
df.apply(lambda row: row.replace(0, max(row)), axis=1)
a b c
0 1 3 3
1 2 5 5
2 34 10 34
If you want to to replace NaN
- which seemed to be your actual goal according to your comment - you can use
df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})
a b c
0 1.0 3.0 NaN
1 2.0 NaN 5.0
2 NaN 10.0 34.0
df.T.fillna(df.max(axis=1)).T
yielding
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 34.0 10.0 34.0
which might be more efficient (have not done the timings) than
df.apply(lambda row: row.fillna(row.max()), axis=1)
Please note that
df.apply(lambda row: row.fillna(max(row)), axis=1)
does not work in each case as explained here.

Cleb
- 25,102
- 20
- 116
- 151
-
1df.apply(lambda row: row.fillna(max(row)), axis=1) did the trick. Thanks. – rhaskett Jul 29 '15 at 18:30
-
1
-
1interesting. Luckily that code is long gone but that could have been a nasty bug. – rhaskett Dec 13 '17 at 08:40