0

I am using python record linkage and I am trying to merge two csv files by fuzzy matching by company name and state.

While running the code, I get a KeyError message about label not being found and I do not understand what I need to do on my end to get the code to run.

Here is the code

import pandas as pd 
import recordlinkage

reference_usa = pd.read_csv('all_reference_usa.csv', index_col='id')
oc_sample = pd.read_csv('oc_sample.csv', index_col='company_number', low_memory=False)

indexer = recordlinkage.Index()
indexer.sortedneighbourhood(left_on='state', right_on='state')
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))

compare = recordlinkage.Compare()
compare.string('companyname',
            'name',
            threshold=0.95)
features = compare.compute(candidates, reference_usa,
                        oc_sample)

Here is the error message

File "/Users/Desktop/python/example.py", line 16, in <module>
    features = compare.compute(candidates, reference_usa,
  File "/Users//anaconda3/lib/python3.10/site-packages/recordlinkage/base.py", line 862, in compute
    results = self._compute(pairs, x, x_link)
  File "/Users//anaconda3/lib/python3.10/site-packages/recordlinkage/base.py", line 686, in _compute
    sublabels_left = self._get_labels_left(validate=x)
  File "/Users/anaconda3/lib/python3.10/site-packages/recordlinkage/base.py", line 652, in _get_labels_left
    raise KeyError(error_msg)
KeyError: 'label is not found in the dataframe' ```

Er...
  • 526
  • 4
  • 10

0 Answers0