Basically there are two columns: Customer_column and Company_column. Customer_column is taking input from customer also there is a rule to insert data in (OR,AND) condition.
I want check one column value into the another column. The tricky part is like there is an OR , AND condition values available in column which I want to check. The pipe (|) symbol stands for OR condition, and the comma (,) stands for AND condition.
For example: (1) if my one column (Customer_column) has (Oil|Leak) that means I want to check 'oil' or 'leak'. One of them should be available in respective column (Company_column).(2) If it contains ((Oil)|(Leak,engine)) that means I want to check whether 'oil' word should be available or 'leak' + 'engine' word should be available in the respective column.(3)If it contains like((Oil|Leak),engine) that means I want to check here whether a combination of 'oil'+ 'engine' word should be available or 'leak' + 'engine' word should be available in the respective column.(4) )If it contains like ((Oil|Leak|Water),engine,Machine) that means I want to check here whether a combination of 'oil'+'engine'+'machine' word should be available or 'leak' +'engine'+'machine' word or 'water' +'engine'+'machine' word should be available in the respective column.
Below is my data frame:
import pandas as pd
data = {'Customer_column': ['(Oil|Leak)', '((Oil)|(Leak,engine))', '(Oil|Leak),engine)', '((Oil|Leak|Water),engine,Machine)', '(Leak,water,There)|(Mark,water,There)'],
'Company_column': ['(leak is present in radiator)', '(engine is leaking)', '(water leak from radiator)', '(water & oil is available in engine machine)', '(there is a water leak mark at engine)']
}
df = pd.DataFrame(data)
print (df)
Below is my expected output:
data = {'Customer_column': ['(Oil|Leak)', '((Oil)|(Leak,engine))', '(Oil|Leak),engine)', '((Oil|Leak|Water),engine,Machine)', '(Leak,water,There)|(Mark,water,There)'],
'Company_column': ['(leak is present in radiator)', '(engine is leaking)', '(water leak from radiator)', '(water & oil is available in engine machine)', '(there is a water leak mark at engine)'],
'Result': ['Leak', 'Leak,engine', 'None', 'oil engine machine,water engine machine', 'Leak water There,Mark water There'],
}
df = pd.DataFrame(data)
print (df)
I tried regex and contain method to solve this. For the OR condition I got my result but I am getting wrong output where the AND condition is written.
import re
df['match'] = [m.group() if (m:=re.search(fr'\b{re.escape(b)}\b', a, flags=re.I)) else None
for a,b in zip(df['Customer_column'], df['Company_column'])]
The second code I tried:
def matches(cust, comp):
words_comp = set(comp[1:-1].casefold().split())
return '+'.join([x for x in cust[1:-1].split('|')
if set(x.casefold().split(','))
.issubset(words_comp)
])
df['match'] = [matches(cust, comp) for cust, comp in
zip(df['Customer_column'], df['Company_column'])]
df
The above one is some how giving me correct result with some limitation of brackets but gives wrong output in condition 3rd and 4th.