I am new to programming and specially regex. I have encountered a problem mapping a dictionary items to a pandas dataframe column.
A Minimal reproducible example would be as following (my original dataset is a large one):
my csv file looks like:
id | color | status |
---|---|---|
1 | red | "this is equal to the / number 3" |
2 | yellow | you should visit the url \n http:13/color/findings/7 |
67 | green | conver it to a new value |
7 | blue | "this is equal to the / number 13" |
8 | green | conver it to a new value |
23 | white | you should visit the url \n http:13/color/findings/67 |
The result I would like to have is modifying the status of each element to be more generic:
id | color | status |
---|---|---|
1 | red | "this is equal to a number" |
2 | yellow | you should visit the corresponding website |
67 | green | conver it to a new value |
7 | blue | "this is equal to a number" |
8 | green | conver it to a new value |
23 | white | you should visit the corresponding website |
the method I would like to use is creating a dictionary which the keys and values are the corresponnding status comments and replace them:
my_dict = {
'"this is equal to the \/ number \d+"' : '"this is equal to a number"',
'you should visit the url \\n http:\d+\/color\/findings\/\d+' : 'you should visit the corresponding website',
'conver it to a new value' : 'conver it to a new value'
}
then for the first method, I tried to replace them by mapping:
df['status'] = [next((v for k,v in my_dict.items() if k in x), float('nan')) for x in df['status'].tolist()]
which gives me only the status which is similar to the original key value : "conver it to a new value"
And also I tried:
dictkeys_pattern = re.compile('|'.join(my_dict), re.IGNORECASE)
status_found = df['status'].str.findall(my_dict)
stat = []
for i in status_found:
for k, v in my_dict.items():
if re.match(k, i, re.IGNORECASE):
stat.append(v)
else:
stat = None
if status_found:
stat = []
for i in status_found:
for k, v in my_dict.items():
if re.match(k, i, re.IGNORECASE):
stat.append(v)
else:
stat = None
However, status_found is an empty Series.
Could someoe help me and show me which part I am doing wrong?