I am trying to use record linkage to merge datasets by fuzzy matching. I am positive that their are no duplicate unique ids in either dataset. However I am getting the error that there are no potential candidates. How do I fix this error?
Here is my code
import pandas as pd
import recordlinkage
reference_usa = pd.read_csv('all_reference_usa.csv', index_col='id')
oc_sample = pd.read_csv('oc_sample.csv', index_col='id', low_memory=False)
indexer = recordlinkage.Index()
indexer.block(left_on='state', right_on='state')
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))
compare = recordlinkage.Compare()
compare.exact('state', 'state', label='state')
compare.string('companyname',
'name',
threshold=0.95,
label='company')
features = compare.compute(candidates, reference_usa,
oc_sample)
Here is the error
/Users/anaconda3/lib/python3.10/site-packages/recordlinkage/algorithms/string.py:55: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.