0

Basically there are two columns: Customer_column and Company_column. Customer_column is taking input from customer also there is a rule to insert data in (OR,AND) condition.

I want check one column value into the another column. The tricky part is like there is an OR , AND condition values available in column which I want to check. The pipe (|) symbol stands for OR condition, and the comma (,) stands for AND condition.

For example: (1) if my one column (Customer_column) has (Oil|Leak) that means I want to check 'oil' or 'leak'. One of them should be available in respective column (Company_column).(2) If it contains ((Oil)|(Leak,engine)) that means I want to check whether 'oil' word should be available or 'leak' + 'engine' word should be available in the respective column.(3)If it contains like((Oil|Leak),engine) that means I want to check here whether a combination of 'oil'+ 'engine' word should be available or 'leak' + 'engine' word should be available in the respective column.(4) )If it contains like ((Oil|Leak|Water),engine,Machine) that means I want to check here whether a combination of 'oil'+'engine'+'machine' word should be available or 'leak' +'engine'+'machine' word or 'water' +'engine'+'machine' word should be available in the respective column.

Below is my data frame:

import pandas as pd

data = {'Customer_column': ['(Oil|Leak)', '((Oil)|(Leak,engine))', '(Oil|Leak),engine)', '((Oil|Leak|Water),engine,Machine)', '(Leak,water,There)|(Mark,water,There)'],
        'Company_column': ['(leak is present in radiator)', '(engine is leaking)', '(water leak from radiator)', '(water & oil is available in engine machine)', '(there is a water leak mark at engine)']
        }

df = pd.DataFrame(data)

print (df)

Below is my expected output:

data = {'Customer_column': ['(Oil|Leak)', '((Oil)|(Leak,engine))', '(Oil|Leak),engine)', '((Oil|Leak|Water),engine,Machine)', '(Leak,water,There)|(Mark,water,There)'],
        'Company_column': ['(leak is present in radiator)', '(engine is leaking)', '(water leak from radiator)', '(water & oil is available in engine machine)', '(there is a water leak mark at engine)'],
        'Result': ['Leak', 'Leak,engine', 'None', 'oil engine machine,water engine machine', 'Leak water There,Mark water There'],
        }

df = pd.DataFrame(data)

print (df)

I tried regex and contain method to solve this. For the OR condition I got my result but I am getting wrong output where the AND condition is written.

import re
df['match'] = [m.group() if (m:=re.search(fr'\b{re.escape(b)}\b', a, flags=re.I)) else None
               for a,b in zip(df['Customer_column'], df['Company_column'])]

The second code I tried:

def matches(cust, comp):
    words_comp = set(comp[1:-1].casefold().split())
    return '+'.join([x for x in cust[1:-1].split('|')
                     if set(x.casefold().split(','))
                           .issubset(words_comp)
                    ])

df['match'] = [matches(cust, comp) for cust, comp in
               zip(df['Customer_column'], df['Company_column'])]

df

The above one is some how giving me correct result with some limitation of brackets but gives wrong output in condition 3rd and 4th.

0 Answers0