0

I am building a data pipeline in Pandas, and any given number can be

  • Explicitly: 0
  • Missing (np.nan)
  • Missing, but imputed to have a value of 0

I'd like to replace the imputed 0's in the table with a value that indicates that they have been imputed. The best I could think of was a "0" where the repr is "-"

I tried:

class MyNaNThatIsLikeZero(int):
    def __repr__(self):
        return '-'

myNaN = MyNaNThatIsLikeZero()

But pandas is converting this to just the float 0.0

To clarify, I need this value to behave as 0, so it needs to implement all the dunder methods

MYK
  • 1,988
  • 7
  • 30
  • You must understand, usually and ideally, **there are no `int` or `float` objects in your dataframe at all**. Just add a column with a categorical variable detailing this distinction – juanpa.arrivillaga Jan 18 '22 at 15:39

1 Answers1

0

pandas is converting this to just the float 0.0

You can disable that conversion by passing dtype=object to the Dataframe constructor. See my other answer: https://stackoverflow.com/a/66907030/13622927

Expurple
  • 877
  • 6
  • 13