Highest Voted 'python-dedupe' Questions

2

votes

1 answer

fuzzy duplicate check using python dedupe library error

I'm trying to use the python dedupe library to perform a fuzzy duplicate check on my mock data, but i keep getting this error: {'Vendor': {0: 'ABC', 1: 'ABC', 2: 'TIM'}, 'Doc Date': {0: '5/12/2019', 1: '5/13/2019', 2: '4/15/2019'}, 'Invoice Date':…

asked Jan 18 '20 at 21:20

python_rok

61
1
9

2

votes

1 answer

How to use pre labeled training data for Python Dedupe

I am using Python Dedupe package for record linkage tasks. It means matching Company names in one data set to other. The Dedupe package allows user to label pairs for training Logistic Regression model. However, it's a manual process and one need to…

python duplicates record-linkage python-dedupe

asked Jul 18 '19 at 10:03

usct01

838
7
18

2

votes

1 answer

Dedupe Python - "Records do not line up with data model"

I am stuck with setting up python and the library dedupe from dedupe.io to deduplicate a set of entries in a postgres database. The error is - "Records do not line up with data model" which should be easy to solve but I just do not get why I get…

python duplicates python-dedupe

asked Jan 22 '19 at 18:56

Pixelartist

378
5
17

2

votes

1 answer

AttributeError: 'NoneType' object has no attribute 'learn_predicates'

I have information about found doubles in table learning, where entity_id is the same for doubles. I want to teach Dedupe by example, but get error. What am I doing wrong? con = psycopg2.connect(database=db_conf['NAME'], …

python-dedupe

asked May 10 '18 at 05:32

tatka

301
1
3
9

2

votes

1 answer

Python Record Linkage, Fuzzy Match and Deduplication

I have 3 dataset of customers with 7 columns. CustomerName Address Phone StoreName Mobile Longitude Latitude every dataset has 13000-18000 record. I am trying to fuzzy match for deduplication between them. my data set columns don't have same…

python duplicates fuzzywuzzy record-linkage python-dedupe

asked May 09 '18 at 08:17

Dr Sima

135
1
12

2

votes

0 answers

Scaling Dedupe package functionality to large data using mysql DB

I have been now trying for a while to make a working example of the gazetteer/dedupe that scales to semi-large datasets connecting to SQL (using examples provided by the package) and have been unsuccessful. Would really appreciate if anyone could…

mysql performance record-linkage python-dedupe entityresolver

asked Apr 12 '18 at 13:36

mersa

85
1
9

2

votes

2 answers

Use Python dedupe library to return all matches against messy dataset

First, if you haven't seen the Dedupe library for Python: it's awesome. Much like TensorFlow, it's a great way to bring machine learning to the masses (like me). I'm trying to do record linkage of names against a single, large, messy data set. I'm…

fuzzy-comparison record-linkage python-dedupe

asked Nov 17 '17 at 03:50

PANDA Stack

1,293
2
11
30

2

votes

1 answer

Increase max_components variable in dedupe library

How can I increase default value in max_components variable? By default max_components is set to 30000. I need increase this limit because every time I do deduplications (using the same datasets) I have different results. I think that the total…

python pyspark record-linkage python-dedupe

asked Aug 03 '17 at 09:55

mjimcua

2,781
3
27
47

2

votes

0 answers

Python Postgresql dedupe consuming a lot of time. Can there be any optimization?

I am using postgres dedupe example code. For 10,000 rows, it is consuming 163 seconds. I found that it is consuming most of the time in this part: full_data = [] cluster_membership = collections.defaultdict(lambda : 'x') for cluster_id,…

python postgresql python-dedupe

asked Aug 01 '17 at 03:22

Shubham Singh

91
2
12

2

votes

3 answers

Python deduplicate records - dedupe

I want to use https://github.com/datamade/dedupe to deduplicate some records in python. Looking at their examples data_d = {} for row in data: clean_row = [(k, preProcess(v)) for (k, v) in row.items()] row_id = int(row['id']) …

python pandas dictionary record-linkage python-dedupe

asked Sep 18 '16 at 07:19

Georg Heiler

16,916
36
162
292

2

votes

2 answers

Setting explicit rules for matching records using Python Dedupe library

I'm using the Dedupe library to match person records to each other. My data includes name, date of birth, address, phone number and other personally identifying information. Here is my question: I always want to match two records with 100%…

python duplicates record-linkage python-dedupe

asked Sep 13 '15 at 14:02

blahblahblah

2,299
8
45
60

2

votes

1 answer

Python - Trouble with Dedupe: TypeError: unhashable type: 'numpy.ndarray'

I'm having trouble getting dedupe to run. I am trying to use this library to remove duplicates from a huge set of addresses. Here is my code: import collections import logging import optparse from numpy import nan import dedupe from unidecode…

python python-2.7 numpy python-dedupe

asked Jan 16 '15 at 20:41

Connor M

182
2
12

1

vote

1 answer

pip install pylbfgs fails in a clean virtualenv

On a completely fresh virtualenv, installing pylbfgs fails with the error below. My goal is to install dedupe, but it depends on pylbfgs. I'm assuming it has something to do with the release of Cython 3.0.0 a few days ago, but even if I do pip…

python pip cython cythonize python-dedupe

asked Jul 20 '23 at 14:59

Mathias Bak

4,687
4
32
42

1

vote

0 answers

Setting seed in python creating non consistent results on AWS

In my code I use the dedupe library to match records between 2 datasets. The underlying library uses random numbers from python's random library and numpy's random submodule, but it provides no way to set a seed for either. In our use case it is…

python amazon-web-services random-seed python-dedupe

asked Apr 24 '23 at 15:18

AMuresan

11
1

1

vote

1 answer

Pandas Dedupe: supplying self-created training data

I'm using pandas-dedupe to link a dataframe with mispellings to another with record-level info. Here is a much simplified example: df1 = pd.DataFrame({'a': ['cat', 'dog', 'frog', 'mouse', 'snake'], \ 'info': ['mammal', 'mammal', 'amphibian',…

python pandas fuzzy-search python-dedupe

asked Mar 01 '22 at 15:22

svenkatesh

1,152
2
10
25

Questions tagged [python-dedupe]