I am working with sensitive data. Sample data in the profile report shows the first 5 rows from the dataset. If you are looking at a profile report for columns with first_name
, last_name
, and SSN, you can stitch together 5 people's PII.
I was able to suppress the Sample Data tab with:
profile = ProfileReport(df, title="Profiling Report", samples={"head": 0, "tail": 0})
However, when you click More details
the sample data (first 5 rows) is still displayed.
I was then able to suppress additional data in the report with:
df.profile_report(sensitive=True)
This is swinging the pendulum too far in the other direction. The distribution of values and other key output is being masked.
Is there a way to simply have the sample data be 5 records selected at random?
Thank you!!!