0

I am trying to use record linkage to merge datasets by fuzzy matching. I am positive that their are no duplicate unique ids in either dataset. However I am getting the error that there are no potential candidates. How do I fix this error?

Here is my code

import pandas as pd 
import recordlinkage

reference_usa = pd.read_csv('all_reference_usa.csv', index_col='id')
oc_sample = pd.read_csv('oc_sample.csv', index_col='id', low_memory=False)

indexer = recordlinkage.Index()
indexer.block(left_on='state', right_on='state')
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))

compare = recordlinkage.Compare()
compare.exact('state', 'state', label='state')
compare.string('companyname',
            'name',
            threshold=0.95,
            label='company')
features = compare.compute(candidates, reference_usa,
                        oc_sample)

Here is the error

/Users/anaconda3/lib/python3.10/site-packages/recordlinkage/algorithms/string.py:55: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

0 Answers0