-1

My ideia is to find every email in a sentence and replace it for a different random email (anonymization). But I can't get the result I want. Every email is replaced for the same one or I get an error (list index out of range)

input: email = "daniel@hotmail.com sent it to ana@gmail.com"

output I want email = "albert@hotmail.com sent it to john@gmail.com"

random_emails = ["albert", "john", "mary"]


def find_email(email: str):
    result = email
    i = 0
    email_address = r"\S+@"
    for text in email:
            result = re.sub(email_address, random_emails[i] + "@", result)
            i += 1
    return result

print(find_email(email))
  • Do you have a [mcve]? – AMC Jan 18 '20 at 01:01
  • As it stands, I think the organization/design of your program could really be improved. I would recommend working on that before anything else. Oh, and your regex to find emails is woefully incomplete, although I'm guessing you were aware of that. – AMC Jan 18 '20 at 01:03

2 Answers2

0

I found a solution, but note that identical emails will be anonymized in the same way. I let you try this :

import re

email = "daniel@hotmail.com sent it to ana@gmail.com"
random_emails = ["albert", "john", "mary"]

def find_email(email: str):
    result = email
    i = 0
    email_address = r"\S+@"
    regex_matches = re.findall(email_address, email)
    for match in regex_matches:
        result = result.replace(match, random_emails[i] + "@")
        i += 1
    return result

print(find_email(email))
Phoenixo
  • 2,071
  • 1
  • 6
  • 13
0

You dont need for loop, and I think your RegExr can be improved

def find_email(email):
    result = email
    email_address = r"(\w+@)(\w+.* )(\w+@)(\w+.*)"
    a='AAAAA@'
    b='BBBBB@'
    result = re.sub(email_address, rf'{a}\2{b}\4', result)
    return result


email = "daniel@hotmail.com sent it to ana@gmail.com"
print(find_email(email))

Explaining:

You can create substitution groups:

1º = 1º email 2º = server and texts 3º = 2º email 4º = server.com

And now, you just need to replace \1 and \2 with everythink you want

example2: Your new routine

import re
from random import seed
from random import randint

random_emails = ["albert", "john", "mary"]


def find_email(email):
    result = email
    email_address = r"(\w+@)(\w+.* )(\w+@)(\w+.*)"
    first = randint(0, 2)
    second = randint(0, 2)
    while first == second:
        second = randint(0, 2)
    result = re.sub(email_address, rf'{random_emails[first]}@\2{random_emails[second]}@\4', result)
    return result


email = "daniel@hotmail.com sent it to ana@gmail.com"
print(find_email(email))

I used random to generate an random number to got emails from list. And "while first == second:" just to not repeat first and second emails

Rica Gurgel
  • 116
  • 1
  • 2
  • 7
  • I used random to generate an random number to got emails from list. And "while first == second:" just to not repeat first and second emails randomic got. – Rica Gurgel Jan 18 '20 at 01:34