I am trying to use the dedupe.io Python library, however for my needs I need to connect to a MS-SQL database.
So I decided first get the csv example working (which I did) then I thought I would try and convert the pgSQL example to a MS-SQL version. Currently I am about halfway through converting the script when I encounter my issue.
Essentially I am stuck where the script tries to write the blocking map out to a csv file. This is the line/call that appears to fail:
b_data = deduper.blocker(full_data)
According to the documentation it should:
Yields tuples of (predicate, record_id)
However I get the following error:
File "C:\PythonV\dedupeio\dedupe\lib\site-packages\dedupe\blocking.py", line 42, in __call__
record_id, instance = record
TypeError: cannot unpack non-iterable int object
So I thought maybe I'm doing something wrong so I followed the same logic and tried to apply the function call to the CSV version which appears to work (as in it runs and outputs duplicates) but I get the same error. So now I'm thinking there is something I'm missing or it is a bug. Either way I would like to know how I might be able to work around it?