I have a function that takes in a list of data and then removes any data that matches any of the regexes as defined below:
def clean_data(data):
# Regex for email, punctuation, common words
regex_list = ['[\w\.-]+@[\w\.-]+', '[^\P{P}-]+', '\band\b|\bor\b|\bnot\b|\ba\b|\ban\b|\bis\b|\bthe\b|\bof\b|\blike\b']
for i in data:
for r in regex_list:
i = re.sub(r, '', i)
return data
I defined data
as the following:
data = ['this is like my name: Bob.', 'my email is bob@gmail.com']
When I run it in console, this is the output I get:
clean_data(data)
Out[74]: ['this is like my name: Bob.', 'my email is bob@gmail.com']
What am I doing wrong?