I am using python record linkage and I am trying to merge two csv files by fuzzy matching by company name and state.
While running the code, I get a KeyError message about label not being found and I do not understand what I need to do on my end to get the code to run.
Here is the code
import pandas as pd
import recordlinkage
reference_usa = pd.read_csv('all_reference_usa.csv', index_col='id')
oc_sample = pd.read_csv('oc_sample.csv', index_col='company_number', low_memory=False)
indexer = recordlinkage.Index()
indexer.sortedneighbourhood(left_on='state', right_on='state')
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))
compare = recordlinkage.Compare()
compare.string('companyname',
'name',
threshold=0.95)
features = compare.compute(candidates, reference_usa,
oc_sample)
Here is the error message
File "/Users/Desktop/python/example.py", line 16, in <module>
features = compare.compute(candidates, reference_usa,
File "/Users//anaconda3/lib/python3.10/site-packages/recordlinkage/base.py", line 862, in compute
results = self._compute(pairs, x, x_link)
File "/Users//anaconda3/lib/python3.10/site-packages/recordlinkage/base.py", line 686, in _compute
sublabels_left = self._get_labels_left(validate=x)
File "/Users/anaconda3/lib/python3.10/site-packages/recordlinkage/base.py", line 652, in _get_labels_left
raise KeyError(error_msg)
KeyError: 'label is not found in the dataframe' ```