1

How can I stricly enforce a dtype Decimal in a pandas DataFrame?

To clarify: I am not looking for weak workarounds, such as rounding every time I write to or read from a column (and hope that no other operations happend elsewhere that might lead to unwanted results).

I really want to be 100% sure that whatever is written in that column, no matter where it might have come from, will always have exactly 2 significant digits behind the decimal point, end of story. And if a user intends to write something that's not in agreement, the whole thing should blow up (either producing a TypeError or ValueError). --> To avoid theoretical dicussions and motivate the usage a bit: I am dealing with a trading system, that's why saving anything other than 2 decimal points in that frame would be a hard error, always.

I have tried to assign a dtype, but without success:

from decimal import Decimal
df[col].astype(Decimal)

Pydantic comes to mind: but if I bake the df into a class (say class MyDfType), then do I need to write my own setter/getter functions into MyDfType for the contained dataframe (MyDFType().df) ensure that all values to/from certain cols are manually enforced to be Decimal?

KingOtto
  • 840
  • 5
  • 18
  • `Decimal` will not allow you to vectorize any operation, one option that you might not have considered it to use integers and values as cents. My 2 cents (pun intended :p) – mozway Dec 06 '22 at 13:23
  • You can't do it with Pandas alone. You can use Pydantic or Pandera to define a schema for the dataframe and validate against it – Panagiotis Kanavos Dec 06 '22 at 13:24
  • Hi Panas, how would you implement this with pydantic? Write setters/getters and pass every operation on `df` through those? Or any built-ins I can use? – KingOtto Dec 06 '22 at 13:25
  • 1
    @mozway in ETL scenarios vectorization performance isn't important. Unexpected rounding or values that can't be stored in a database table because they violate constraints are far more annoying – Panagiotis Kanavos Dec 06 '22 at 13:25
  • 2
    @KingOtto I've used [Pandera's](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html) Checks and schemas for this which allows specifying a schema and validating an entire dataframe against it. `Decimal` is one of the available types. A nice trick is you can have Pandera infer the schema of a dataframe and save it to a Python file for editing – Panagiotis Kanavos Dec 06 '22 at 13:32
  • @PanagiotisKanavos: Wow, this is perfect. I had no clue.. thank you so much! The `df` is already sitting in a `Transaction` class, so putting the schema there is great - and soo much easier than pydantic or manual setting/getting! – KingOtto Dec 06 '22 at 13:43

0 Answers0