I have 2 dataframes, one containing a columnn of strings (df = data) which I need to categorise, and the other containing possible categories and search terms (df = categories). I would like to add a column to the "data" dataframe which returns a category based on search terms. For example:
data:
**RepairName**
A/C is not cold
flat tyre is c
the tyre needs a repair on left side
the aircon is not cold
categories:
**Category** **SearchTerm**
A/C aircon
A/C A/C
Tyre repair
Tyre flat
DESIRED RESULT data:
**RepairName** **Category**
A/C is not cold A/C
flat tyre is c Tyre
the tyre needs a repair on left side Tyre
the aircon is not cold A/C
I have tried the following lambda function with apply. I am not sure if my column references are in the correct place:
data['Category'] = data['RepairName'].apply(lambda x: categories['Category'] if categories['SearchTerm'] in x else "")
data['Category'] = [categories['Category'] if categories['SearchTerm'] in data['RepairName'] else 0]
but I keep getting the error messge:
TypeError: 'in <string>' requires string as left operand, not Series
This provides true / false as to whether a category exists based on SearchTerm, however I have not been able to return the category associated with the Search Term:
data['containName']=data['RepairName'].str.contains('|'.join(categories['SearchTerm']),case=False)
And these both sometimes work, but not all the time (perhaps because some of my search terms are more than one word?)
data['Category'] = [
next((c for c, k in categories.values if k in s), None) for s in data['RepairName']]
d = dict(zip(categories['SearchTerm'], categories['Category']))
data['CategoryCheck'] = [next((d[y] for y in x.split() if y in d), None) for x in data['RepairName']]