So I have this code that uses nested iterrows. I've read that iterrows is much slower than .apply or vectorization.
workbook = openpyxl.load_workbook(output)
worksheet = workbook['Sheet1']
for indexA, rowA in dfA.iterrows():
nameA = rowA[0]
for indexB, rowB in dfB.iterrows():
nameB = rowB[14]
if nameB.startswith(nameA):
print(f"Found match : {nameB} starts with {nameA}")
ws[f"A{indexA}"] = indexB[1]
ws[f"B{indexA}"] = indexB[2]
ws[f"C{indexA}"] = indexB[3]
wb.save(output)
wb.close()
I can't figure out how to use vectorization or apply on this part. Currently, with 500,000 rows in dfA, it's taking over 3 hours. I'm looking for anyway to speed this up. Thanks for your help!