I am struggling to get this code that I wrote to work. I know it is probably an easy fix but I can't seem to get it to work correctly. In essence, what I want is to create a boolean mask on a pandas dataframe that returns only rows where the values in "Actual Manufacturer" or "Actual Collection" exist in "PqaQuestion". It works well with one set of criteria but adding in multiple criteria messes things up a bit. I can't seem to get the "or" operator in there without triggering the userwarning: boolean series key will be reindexed to match dataframe index, which does mess up the output. If anyone could help fix this issue but also help me understand why this happens I would greatly appreciate it. I've seen other posts about the subject but none that explain it and I can't seem to tailor other posts to my own situation.
names= ['PqaPrSKU', 'PrName', 'White Label Manufacturer', 'White Label Collection', 'Actual Manufacturer', 'Actual MaID', 'Actual Collection', 'PqaID', 'PqaQuestion', 'UpdatedQuestion', 'PanID', 'PanAnswer', 'UpdatedAnswer', 'DateAdded', 'PrBclgID']
def match_function(column1_value, column2_value, column3_value):
return (column2_value is not None) and (column1_value is not None) and (column3_value is not None) and (str(column2_value).lower() in str(column1_value).lower()) or (str(column3_value).lower() in str(column1_value).lower())
import pandas as pd
df = pd.read_csv('Bucket61(8.22).csv', names= names, skipinitialspace=True, skiprows=1)
#print(df.from_records(data))
indexer = df.apply(lambda row: match_function(row["PqaQuestion"], row["Actual Collection"], row["Actual Manufacturer"]), axis=1)
filtered_df = df[indexer]
print(filtered_df[indexer])
#print(df[indexer])
from pandas import ExcelWriter
writer = ExcelWriter('ScrubbedQATemplate.xlsx')
filtered_df.to_excel(writer, 'Sheet1')
writer.save()