0

I have 2 variables in a pandas dataframe which are being used in a calculation (Var1 / Var2) and the values consist of both floating point values and missing values (which I chose to coerce to 0). In my end calculation I am receiving 'inf' values and NA values. The NA values are expected but how do I derive a useable number instead of the 'inf' values?

some 'inf' values are appearing when VAR1 = float and Var2 = 0, others appear when both VAR1 and VAR2 are floats.

My initial approach was to round the floats to 2 significant figures before the calculation but I still received the inf values.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • `np.nan_to_num(var, copy=False)` isn't return useable number instead of the 'inf' – Peacepieceonepiece Sep 28 '21 at 00:18
  • I apologize for shouting, but anyway DON'T encode missing values as some specific value such as 0, or -99, or 999 or anything else. Invariably what happens is that somewhere down the line, the "missingness" gets forgotten, and the value is used in some calculation. Assign NA or NaN for missing values, such that any derived value is then also NA or NaN (I forget what the convention for missing values is for pandas). – Robert Dodier Sep 28 '21 at 00:23

1 Answers1

1

You may be getting inf because you are dividing by zero. For example, if var1 = 5 and var2 = 0, then you are computing 5 / 0.

In pure Python this returns a ZeroDivisionError, but in lots of data libraries they avoid throwing this error because it would crash your code. Instead, they output inf, or "infinity".

When var1 and var2 are both floats, it may be that var2 is extremely small. This would result in var1 / var2 being extremely large. At a certain point, Python doesn't let numbers get any larger and simply represents them as inf.

Rounding wouldn't help, because if var2 = 0, then it would round to 0, and if var2 is very small, it would also round to 0. As discussed earlier, dividing by zero causes the inf.