0

Now, I'm aware that the new version of pandas has a special data type extension 'Int64' that allows missing values to coexist with integers in the same column, this topic explains as much. However, I'd like to have an integer column that also allows infinity values. But when I'm trying to add float('inf') into the column that has a 'Int64' type, I get an error: "cannot safely cast non-equivalent float64 to int64".

The reason why I want infinity values in my column is that I have a column of integer distances, and while some of these distances are unknown, it is known that such distances are over, say, 3000 meters. And when I calculate the median of this column, it makes a difference. For example, the median of an array [1, 5, 10, 20, 50, nan, nan, nan] is 10, because NaN values are ignored. But the median of an array [1, 5, 10, 20, 50, inf, inf, nan] is 20, because inf values are considered larger than 50.

Is there any way to make integers compatible with 'inf' values? Or do I have to make do with float?

UchuuStranger
  • 45
  • 1
  • 9
  • can you use the maximum value available in int64 instead of inf, if your purpose is to get the median of an array, it won't change much the result? – Ben.T Apr 25 '20 at 23:22
  • Floats are really integers too. It's just that specific values were chosen to represent infinity and nan. You can assign two bit patterns to do the same. You'll have to implement your own processing to identify and mask these values. – Mad Physicist Apr 26 '20 at 00:56
  • A pandas can be object dtype, and contain ints, np.nan, np.inf, None and/or strings – hpaulj Apr 26 '20 at 03:50
  • @hpaulj Tried your suggestion. It's true that int, nan and inf can all be contained in a column of "object" type, but it appears that this solution is very unstable and is only fit for read-only columns. Not only such column will accept actual real numbers without any issues, which it shouldn't - a not-so-obvious problem is that even if we're very careful to not let that happen, such column still reverts back to "float64" type any chance it gets. For example, the "fillna()" method redefines the type to whatever it sees fit, ignoring types of both the filling series and the column being filled. – UchuuStranger Apr 27 '20 at 14:56

1 Answers1

0

Math inf should work in your case -

import math
import statistics as stat


test = math.inf
a = [1, 5, 10, 20, 50, test, test, float("nan")] 
print(stat.median(a))
lostin
  • 720
  • 4
  • 10
  • I think you're kind of missing the point here. Yes, math.inf is equivalent to float('inf'), but regardless of which one we use, once we try to turn this list into a Series, or a column of a DataFrame, it will adopt the "float64" type, and that in turn will convert all integers to real numbers. – UchuuStranger Apr 27 '20 at 14:43
  • Yes, you are right, above solution doesn't really work with dataframes. – lostin Apr 27 '20 at 17:01