Questions tagged [anonymize]

An anonymizer is a tool that de-identifies or masks data/event unidentifiable. It could be a proxy server that acts as a shield between internet and the local network or it can be a tool that consumes data from the source and modifies it into a untraceable data into the destination(like removal of sensitive/protected information).

79 questions
3
votes
3 answers

Anonymizing data / replacing names

Normally I anonymize my data by using hashlib and using the .apply(hash) function. Now im trying a new approach, imagine I have to following df called 'data': df = pd.DataFrame({'contributor':['eric', 'frank', 'john', 'frank', 'barbara'], …
Erfan
  • 40,971
  • 8
  • 66
  • 78
3
votes
0 answers

Removing object metadata in R

I'm writing some code to anonymize an R dataset in such a way that it strips any useful information out of the data while preserving the structure that would be important for running regressions, etc on it. I want to be sure I've removed all…
Empiromancer
  • 3,778
  • 1
  • 22
  • 53
3
votes
1 answer

How to create anonymized email address

I need to implement an anonymized email adress feature in my website, much like airbnb and homeaway do. They create an anonymous address for each conversation between renter and landlord. For example, homeaway has…
user1347271
  • 127
  • 1
  • 8
3
votes
2 answers

Anonymization of Account Numbers in 2TB of CSV's

I have ~2TB of CSV's where the first 2 columns contains two ID numbers. These need to be anonymized so the data can be used in academic research. The anonymization can be (but does not have to be) irreversible. These are NOT medical records, so I do…
cataclysmic
  • 337
  • 2
  • 12
2
votes
3 answers

How to store data in db so that nobody with access to it can understand it?

We are soon releasing a private beta of a domestic economy website. The website of course gathers information from a user's (identified by email only) private financial situation: salary, rent, bills, mortages, etc. All of these are really sensitive…
tobefound
  • 1,091
  • 2
  • 9
  • 15
2
votes
1 answer

Data masking in relational table using R

I am trying to mask data in such a way that referential integrity is not compromised. My table Customer has this data: Customer table Customer_ID | Customer_Name | Address | Phone | Product_ID 143 |…
Deep
  • 528
  • 3
  • 12
  • 27
2
votes
3 answers

Which algorithm for hashing name, firstName and birth-date of a person

I have to save the combination of lastname, firstname and birth-date of a person as a hash. This hash is later used to search for the same person with the exactly same properties. My question is, if SHA-1 is a meaningfull algorithm for this. As far…
HCL
  • 36,053
  • 27
  • 163
  • 213
2
votes
1 answer

How to setup gtag anonymize_ip? Am I doing it wrong?

So reading by reading this little page, first I thought, I just have to add the following line to my gtag script and everything gonna be just fine: gtag('config', '', { 'anonymize_ip': true }); But today I realized that, maybe I…
Presence
  • 115
  • 9
2
votes
2 answers

What is the most efficient & pythonic way to recode a pandas column?

I'd like to 'anonymize' or 'recode' a column in a pandas DataFrame. What's the most efficient way to do so? I wrote the following, but it seems likely there's a built-in function or better way. dataset =…
user1318135
  • 717
  • 2
  • 12
  • 36
2
votes
2 answers

In R - how do I replace all letters in a string with other letters?

I need to anonymize names but in a very specific way so that the format of the entire string is still the same (spaces, hyphens, periods are preserved) but all the letters are scrambled. I want to consistently replace say all A's with C's, all D's…
SBala
  • 85
  • 3
2
votes
2 answers

Anonymize names in paragraph variable by matching and replacement

I am analyzing a school's student report card database. My dataset consists of around 3000 records structured similarly to the example below. Each observation is one teacher's assessment of one student. Each observation contains a three-sentence…
Anders Swanson
  • 3,637
  • 1
  • 18
  • 43
2
votes
1 answer

Find credit card numbers and replace characters at set positions

I have a file that contains credit card numbers (16 characters), I want to find them and replace everything with "X" apart from the first 6 and last 4 numbers. sed -i 's/\([345]\{1\}[0-9]\{3\}\|6011\)\{1\}[ -]\?[0-9]\{4\}[…
2
votes
1 answer

How to anonymize SVN Dump

In France it's important to respect privacy in order to deal with CNIL recommandations. SVN property svn:author keep a trace of every person who has commited changes on the repository. The Cnil's recommandations preconize to anonymize the…
Sylvain
  • 61
  • 1
  • 5
1
vote
3 answers

Data ID pseudonymization

I need to pseudonymize ids in dataset, in order to comply with GDPR. The IDs in question are integers from 0 to 10^7. I am looking form some elegant way to achieve this. The process must be repeatable and easily transferable, therefore I would like…
1
vote
2 answers

How to shuffle column values while keeping same dataset

I would like to to shuffle some columns from a table in Postgres database. I have 2 millions rows. I need to update all not null values by another. I need to keep the same dataset. It's not possible to have the same value two times. It's not…
user3659832
  • 41
  • 1
  • 6