2

I am getting the following error when running a profile report for a subset of my DataFrame.

ValueError: Value '6.180529706513958' should be a ratio between 1 and 0.

This works:

profile = ProfileReport(
    df, title="Profile Report of the January Conversion Dataset"
)
profile.to_file(Path("../../../products/jan_cvr_report.html"))

profile0 = ProfileReport(
    df[df['conversion']==0], title="Profile Report of the January Conversion==0 Dataset"
)
profile0.to_file(Path("../../../products/jan_cvr0_report.html"))

This does not:

profile1 = ProfileReport(
    df[df['conversion']==1], title="Profile Report of the January Conversion==1 Dataset"
)
profile1.to_file(Path("../../../products/jan_cvr1_report.html"))
Climbs_lika_Spyder
  • 6,004
  • 3
  • 39
  • 53

1 Answers1

1

I found a closed Github issue that had a suggestion I got to work. My details and stack trace live there.

Solution: remove_unused_categories

df1 = df[df['conversion']==1].copy(deep=True)
df1.user_id.cat.remove_unused_categories(inplace=True)

After running the above, the profile report worked fine. The classes are extremely unbalanced so when subsetting to just where conversion=1 most of the user_ids are not used. This also fixable by not having the user_id as a category. However, this could be an issue with other categories so I am sharing anyway.

Climbs_lika_Spyder
  • 6,004
  • 3
  • 39
  • 53
  • So, this error seems to be a helpful way to inform you on how to make a class distribution more balanced by grouping, for example with a custom converter. In my case, I had an employment status feature with a cardinality of 9, reducing that to 'employed', 'unemployed', and 'retired' did the trick (although you lose information). – Francis Laclé Dec 09 '21 at 18:56
  • Update: encountered it again, and it got fixed by setting `duplicates=None` for categorical data as explained here: https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/advanced_usage.html#configuration-shorthands - makes sense. – Francis Laclé Dec 09 '21 at 19:18