An anonymizer is a tool that de-identifies or masks data/event unidentifiable. It could be a proxy server that acts as a shield between internet and the local network or it can be a tool that consumes data from the source and modifies it into a untraceable data into the destination(like removal of sensitive/protected information).
Questions tagged [anonymize]
79 questions
3
votes
3 answers
Anonymizing data / replacing names
Normally I anonymize my data by using hashlib and using the .apply(hash) function.
Now im trying a new approach, imagine I have to following df called 'data':
df = pd.DataFrame({'contributor':['eric', 'frank', 'john', 'frank', 'barbara'],
…

Erfan
- 40,971
- 8
- 66
- 78
3
votes
0 answers
Removing object metadata in R
I'm writing some code to anonymize an R dataset in such a way that it strips any useful information out of the data while preserving the structure that would be important for running regressions, etc on it. I want to be sure I've removed all…

Empiromancer
- 3,778
- 1
- 22
- 53
3
votes
1 answer
How to create anonymized email address
I need to implement an anonymized email adress feature in my website, much like airbnb and homeaway do.
They create an anonymous address for each conversation between renter and landlord.
For example, homeaway has…

user1347271
- 127
- 1
- 8
3
votes
2 answers
Anonymization of Account Numbers in 2TB of CSV's
I have ~2TB of CSV's where the first 2 columns contains two ID numbers. These need to be anonymized so the data can be used in academic research. The anonymization can be (but does not have to be) irreversible. These are NOT medical records, so I do…

cataclysmic
- 337
- 2
- 12
2
votes
3 answers
How to store data in db so that nobody with access to it can understand it?
We are soon releasing a private beta of a domestic economy website.
The website of course gathers information from a user's (identified by email only) private financial situation: salary, rent, bills, mortages, etc. All of these are really sensitive…

tobefound
- 1,091
- 2
- 9
- 15
2
votes
1 answer
Data masking in relational table using R
I am trying to mask data in such a way that referential integrity is not compromised.
My table Customer has this data:
Customer table
Customer_ID | Customer_Name | Address | Phone | Product_ID
143 |…

Deep
- 528
- 3
- 12
- 27
2
votes
3 answers
Which algorithm for hashing name, firstName and birth-date of a person
I have to save the combination of lastname, firstname and birth-date of a person as a hash. This hash is later used to search for the same person with the exactly same properties.
My question is, if SHA-1 is a meaningfull algorithm for this.
As far…

HCL
- 36,053
- 27
- 163
- 213
2
votes
1 answer
How to setup gtag anonymize_ip? Am I doing it wrong?
So reading by reading this little page, first I thought, I just have to add the following line to my gtag script and everything gonna be just fine:
gtag('config', '', { 'anonymize_ip': true });
But today I realized that, maybe I…

Presence
- 115
- 9
2
votes
2 answers
What is the most efficient & pythonic way to recode a pandas column?
I'd like to 'anonymize' or 'recode' a column in a pandas DataFrame. What's the most efficient way to do so? I wrote the following, but it seems likely there's a built-in function or better way.
dataset =…

user1318135
- 717
- 2
- 12
- 36
2
votes
2 answers
In R - how do I replace all letters in a string with other letters?
I need to anonymize names but in a very specific way so that the format of the entire string is still the same (spaces, hyphens, periods are preserved) but all the letters are scrambled. I want to consistently replace say all A's with C's, all D's…

SBala
- 85
- 3
2
votes
2 answers
Anonymize names in paragraph variable by matching and replacement
I am analyzing a school's student report card database. My dataset consists of around 3000 records structured similarly to the example below. Each observation is one teacher's assessment of one student. Each observation contains a three-sentence…

Anders Swanson
- 3,637
- 1
- 18
- 43
2
votes
1 answer
Find credit card numbers and replace characters at set positions
I have a file that contains credit card numbers (16 characters), I want to find them and replace everything with "X" apart from the first 6 and last 4 numbers.
sed -i 's/\([345]\{1\}[0-9]\{3\}\|6011\)\{1\}[ -]\?[0-9]\{4\}[…

Julian Garthwaite
- 23
- 2
2
votes
1 answer
How to anonymize SVN Dump
In France it's important to respect privacy in order to deal with CNIL recommandations.
SVN property svn:author keep a trace of every person who has commited changes on the repository.
The Cnil's recommandations preconize to anonymize the…

Sylvain
- 61
- 1
- 5
1
vote
3 answers
Data ID pseudonymization
I need to pseudonymize ids in dataset, in order to comply with GDPR. The IDs in question are integers from 0 to 10^7. I am looking form some elegant way to achieve this. The process must be repeatable and easily transferable, therefore I would like…

klobaska soslaninou
- 11
- 2
1
vote
2 answers
How to shuffle column values while keeping same dataset
I would like to to shuffle some columns from a table in Postgres database. I have 2 millions rows. I need to update all not null values by another.
I need to keep the same dataset. It's not possible to have the same value two times. It's not…

user3659832
- 41
- 1
- 6