0

I am working with sensitive data. Sample data in the profile report shows the first 5 rows from the dataset. If you are looking at a profile report for columns with first_name, last_name, and SSN, you can stitch together 5 people's PII.

I was able to suppress the Sample Data tab with:

 profile = ProfileReport(df, title="Profiling Report", samples={"head": 0, "tail": 0})

However, when you click More details the sample data (first 5 rows) is still displayed.

I was then able to suppress additional data in the report with:

 df.profile_report(sensitive=True)

This is swinging the pendulum too far in the other direction. The distribution of values and other key output is being masked.

Is there a way to simply have the sample data be 5 records selected at random?

Thank you!!!

Shahab Rahnama
  • 982
  • 1
  • 7
  • 14
  • Use Faker and create truly fake data representative of your actual data – itprorh66 Aug 29 '23 at 19:06
  • I'm trying to profile my actual data. I want a summarized view of the data, but I cannot expose PII. Looking for the output of the profile report to either suppress just the sample data (because it is showing the first 5 actual rows of data) or choose the sample data from random entries in the column. This would not expose PII in my metadata. Thanks! – Clay McBride Aug 29 '23 at 21:17

1 Answers1

0

No there isnt't. AS per their documentation, they only 2 sections - First and last records. You can configure how many records you want to shown but not how they are selected to be depicted (the sections are called First and Last).

I would recommend asking for that feature if that's something that matters for what you are developing.

FabC
  • 26
  • 3