I can't for the life of me figure out why something that used to be so simple no longer works. Often I might map a dataframe column to a dictionary and have some null values appear as they aren't found in the dictionary keys. So the resulting column will be floats + null. Typically I convert .astype("Int64")
and boob, the non-nulls are now ints and not floats, with everything else untouched.
Now I'm running into issues where, I treat my data, use Int64 conversions, acceptance tests are passed, yet later down the road in the pipeline data deployment fails because floats are found in these columns.
Just to make sure I'm not insane, I open jupyter notebook, and initializew a basic dataframe, map it to a dictionary for which some dataframe values don't exist in dictionary keys, then cast as "Int64".....and I still gert this issue!! What's going on? I'm sure this used to be so simple....
df = pd.DataFrame({"keys": [5, 10, 15, 20]})
df["after_mapping"] = df["keys"].map({1: 0, 2: 2, 5: 25, 15: 305})
df["after_mapping"] = df["after_mapping"].astype("Int64")
ValueError: Cannot convert non-finite values (NA or inf) to integer