1

I am trying to use the dedupe.io Python library, however for my needs I need to connect to a MS-SQL database.

So I decided first get the csv example working (which I did) then I thought I would try and convert the pgSQL example to a MS-SQL version. Currently I am about halfway through converting the script when I encounter my issue.

Essentially I am stuck where the script tries to write the blocking map out to a csv file. This is the line/call that appears to fail:

b_data = deduper.blocker(full_data)

According to the documentation it should:

Yields tuples of (predicate, record_id)

However I get the following error:

File "C:\PythonV\dedupeio\dedupe\lib\site-packages\dedupe\blocking.py", line 42, in __call__
    record_id, instance = record
TypeError: cannot unpack non-iterable int object

So I thought maybe I'm doing something wrong so I followed the same logic and tried to apply the function call to the CSV version which appears to work (as in it runs and outputs duplicates) but I get the same error. So now I'm thinking there is something I'm missing or it is a bug. Either way I would like to know how I might be able to work around it?

MyNameIsCaleb
  • 4,409
  • 1
  • 13
  • 31

1 Answers1

0

the problem is that full_data is not an sequence of tuples of ids and dicts. https://docs.dedupe.io/en/latest/API-documentation.html#Dedupe.blocker

fgregg
  • 3,173
  • 30
  • 37