For reproducing the issue, Notebook, data, output: github link
I have Contract variable/column in my dataset which looks like this, all look like numbers but they are actually categorical.
When read with pandas, the info says it is read as int. Since the contract variable is a category(from the metadata I received) so I manually changed the variable type like below
df['Contract'] = df['Contract'].astype('categorical')
df.dtypes # shows modified dtype now
I then tried to get report from pandas_profiling
. The generated report shows that contact
interpreted as real number, even though I changed the type from int
to str
/category
.
# Tried both, but resulted in same.
ProfileReport(df)
df.profile_report()
Can you explain right way to interpret datatypes with pandas_profiling
? i.e, change contract
variable to categorical
type.