0

I want to use Dedupe library for record linkage. I wrote this code from Dedupe examples on Github. But when i run my code i get this error :

OverflowError: Python int too large to convert to C ssize_t ##

its because my data are very big.how i cant filter my data_d columns?? it should help. I searched all stackoverflow questions but i couldn't find right answer.

def readData(filename):
    """
    Read in our data from a CSV file and create a dictionary of records,
    where the key is a unique record ID.
    """

    data_d = {}

    with codecs.open(filename,encoding='utf-8') as f:

       reader = csv.DictReader(f)
       for i, row in enumerate(reader):
            clean_row = dict([(k, preProcess(v)) for (k, v) in row.items()])
            data_d[filename + str(i)] = dict(clean_row)

    return data_d
fgregg
  • 3,173
  • 30
  • 37
Dr Sima
  • 135
  • 1
  • 12
  • 1
    Strange I'm getting a `expected string or bytes-like object` error on line 45, in the preProcess function. Did you forget to add something necessary to run the code? What specific version are your dependencies and the python interpreter? – Marcelo Lacerda May 15 '18 at 13:48
  • @marcelo-lacerda i use python 3.6 and i dont have any error like this – Dr Sima May 15 '18 at 14:17
  • @marcelo-lacerda marcelo i fixed your error with str(column). sorry i forget add it in this code. i add it now. – Dr Sima May 15 '18 at 20:35
  • The edit you made removed the portion of the code that caused the error: http://paste.debian.net/1025113/ – Marcelo Lacerda May 17 '18 at 18:38
  • Also your old code runs fine here after using str(column): http://paste.debian.net/1025118/ http://paste.debian.net/1025121/ – Marcelo Lacerda May 17 '18 at 18:45

0 Answers0