5

I have a dataframe which looks like

City   Crime_Rate

A      10

B      20 

C      inf

D      15 

I want to replace the inf with the max value of the Crime_Rate column , so that my resulting dataframe should look like

City   Crime_Rate

A      10

B      20 

C      20

D      15

I tried

df['Crime_Rate'].replace([np.inf],max(df['Crime_Rate']),inplace=True)

But python takes inf as the maximum value , where am I going wrong here ?

Ahamed Moosa
  • 1,395
  • 7
  • 16
  • 30

5 Answers5

10

Filter out inf values first and then get max of Series:

m = df.loc[df['Crime_Rate'] != np.inf, 'Crime_Rate'].max()
df['Crime_Rate'].replace(np.inf,m,inplace=True)

Another solution:

mask = df['Crime_Rate'] != np.inf
df.loc[~mask, 'Crime_Rate'] = df.loc[mask, 'Crime_Rate'].max()

print (df)
  City  Crime_Rate
0    A        10.0
1    B        20.0
2    C        20.0
3    D        15.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
4

Here is a solution for a whole matrix/data frame:

highest_non_inf = df.max().loc[lambda v: v<np.Inf].max() df.replace(np.Inf, highest_non_inf)

dmeu
  • 3,842
  • 5
  • 27
  • 43
3

Set use_inf_as_nan to true and then use fillna. (Use this if you want to consider inf and nan both as missing value) i.e

pd.options.mode.use_inf_as_na = True

df['Crime_Rate'].fillna(df['Crime_Rate'].max(),inplace=True)

   City  Crime_Rate
0    A        10.0
1    B        20.0
2    C        20.0
3    D        15.0
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • 1
    hmmm, not sure if good idea, `NaN` and `inf` are really different thing – jezrael Jun 09 '18 at 10:19
  • 1
    @jezrael Just an alternative. Someday it might be helpful to someone. If I answer usual way it will be same answer as yours. So – Bharath M Shetty Jun 09 '18 at 10:20
  • 4
    Then add notification - it also replace NaNs if exist :) – jezrael Jun 09 '18 at 10:20
  • @Dark, Good one line solution , thank you. What I have been doing was writing another line of code to replace nan with desired value – Ahamed Moosa Jun 09 '18 at 10:33
  • Leave a vote if the solution was helpful and do not use this solution if you want to replace inf with some value and nan with some other value. Good luck :) – Bharath M Shetty Jun 09 '18 at 10:35
  • Thanks @Dark, voted helpful. Yes , in my solution , I have to replace nan with 0 :) – Ahamed Moosa Jun 09 '18 at 15:18
  • In using this solution, one might want to use with `with pd.option_context("mode.use_inf_as_na", True):`. This way, pandas treat NaN as per norm after exiting the `with` block. Refer to https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html – matt Aug 19 '19 at 16:52
0

One way to do it using an additional function replace(np.inf, np.nan) within max().

It replaces inf with nan for the operations happening inside max() and max returns the expected maximum value not inf

Example below : Max value is 100 and replaces inf

#Create dummy data frame
import pandas as pd 
import numpy as np  
a = float('Inf')
v = [1,2,5,a,10,5,a,5,100,2]  
df = pd.DataFrame({'Col_A': v})
#Data frame looks like this
In [33]: df
Out[33]: 
        Col_A
0    1.000000
1    2.000000
2    5.000000
3         inf
4   10.000000
5    5.000000
6         inf
7    5.000000
8  100.000000
9    2.000000

# Replace inf  
df['Col_A'].replace([np.inf],max(df['Col_A'].replace(np.inf, 
np.nan)),inplace=True)

In[35]: df
Out[35]: 
   Col_A
0    1.0
1    2.0
2    5.0
3  100.0
4   10.0
5    5.0
6  100.0
7    5.0
8  100.0
9    2.0

Hope that works !

Ravijeet
  • 133
  • 1
  • 2
  • 11
0

Use numpy clip. It's elegant and blazingly fast:

import numpy as np
import pandas as pd
df = pd.DataFrame({"x": [-np.inf, +np.inf, np.nan, 4, 3]})
df["x"] = np.clip(df["x"], -np.inf, 100)
# Out:
#       x
# 0   -inf
# 1  100.0
# 2    NaN
# 3    4.0
# 4    3.0

To get rid of the negative infinity as well, replace -np.inf with a small number. NaN is always unaffected. To get the max, use max(df["x"]).

Contango
  • 76,540
  • 58
  • 260
  • 305