0

Say I have a text file with following data:

Name   Year
John Scully   1966
Alex Klaus   1961
....
....

Is there an algorithm or method where I can pass the concatenated first and last name, using birth year as a cipher code to generate an encrypted string for each row.

import some_method 

def encrypt_input(joined_name, year):
    output = some_method(joined_name, year)
    return output

Hypothetical expected output:

Name   Year.   EncryptedString
John Scully   1966   123abcd123456
Alex Klaus   1961   43417hfahg678
....
....

This way I can anonymize the data for each row. Even if someone gets the encrypted string, without the proper Year (the cipher code) it would be possible to generate all possible string but it wont be a meaningful name until all years are tested.

I have looked into AES and encryption algorithm but they only take one cipher code and they are very long and overkill.

I can just write my own function/to work this out, but if there already a library for such work I would like to know. I haven't found any so far.

Looking for solution in python if possible, but any kind of insights is helpful.

everestial007
  • 6,665
  • 7
  • 32
  • 72
  • What is the function of this "anonymized string"? – President James K. Polk Jan 07 '21 at 22:59
  • Function is as `fx(name, year) = anonymizedString` . And I can get back the original string with `gx(anonymizedString, year) = name`. While `gx` is able to generate any string with any cipher code (year in this case) it will only produce meaningful name only if the original year was known. – everestial007 Jan 07 '21 at 23:03
  • 1
    It's trivial to try every year from 1900 to 2021, and see which one yields a readable name. – user3386109 Jan 07 '21 at 23:35
  • @user3386109 I am only trying to check if there is an algorithm or method that would take a `string` with `cipher` and convert it into an `anonymous string` and recode back to `string` if the same `cipher` code was given. – everestial007 Jan 07 '21 at 23:41
  • Note: `gx(anonymizedString, year)` means you're talking about encryption and decryption, which is basically _completely unrelated to hashing_. Remove all references to "hashes" and AES" to get a better question and better answers. Actually upon rereading. That's probably the answer. Hashing isn't reversible. You want encryption. – Mooing Duck Jan 07 '21 at 23:52
  • https://en.wikipedia.org/wiki/Outline_of_cryptography#Ciphers – user3386109 Jan 07 '21 at 23:55
  • Sounds to me like you are searching for AES-CMAC: https://pycryptodome.readthedocs.io/en/latest/src/hash/cmac.html – Robert Jan 08 '21 at 11:08
  • 1
    As @user3386109 pointed out, the year has very low entropy that an attacker who gets an anonymized string is able to brute force the years and deduce the correct names. Modern encryption works on bits. So most tried years will have some unintelligible output and only one will be a name that also looks like a name. To handle that, you could transform each firstname and lastname separately into numbers via a bijective mapping. That way, brute forcing will likely not work, because wrong numbers will map back to different, but possibly existing firstname+lastname combinations. – Artjom B. Jan 08 '21 at 17:28
  • I'm not sure that this solves your issue, because modern encryption doesn't preserve the length of numbers that well. Also, good luck coming up with that bijective name mapping. – Artjom B. Jan 08 '21 at 17:30

0 Answers0