0

I'm trying to set up a local copy of web-dedupe working with the default setup, but it simply will not work for me after the third step. I'm able to upload the CSV, but after the fields are selected and the submit button is hit, I get an error:

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Here is the code from app.py dealing with this route. The logs are also posted below. Any help would be much appreciated! Here is their working demo. Please let me know if additional info is needed!

@app.route('/select_fields/', methods=['GET', 'POST'])
def select_fields():
    status_code = 200
    error = None
    print flask_session.keys()
    if not flask_session.get('deduper'):
        return redirect(url_for('index'))
    else:
        inp = flask_session['deduper']['csv'].converted
        filename = flask_session['filename']
        flask_session['last_interaction'] = datetime.now()
        reader = csv.reader(StringIO(inp))
        fields = reader.next()
        del reader
        if request.method == 'POST':
            field_list = [r for r in request.form]
            if field_list:
                training = True
                field_defs = {}
                for field in field_list:
                    field_defs[field] = {'type': 'String'}
                data_d = readData(inp)
                flask_session['deduper']['data_d'] = data_d
                flask_session['deduper']['field_defs'] = copy.deepcopy(field_defs)
                start = time.time()
                deduper = dedupe.Dedupe(field_defs)
                deduper.sample(data_d, 150000)
                flask_session['deduper']['deduper'] = deduper
                end = time.time()
                send_ga_log(
                    'Dedupe initialization', 
                    flask_session['ga_cid'], 
                    label='Timing in seconds',
                    value=int(end-start)
                )
                return redirect(url_for('training_run'))
            else:
                error = 'You must select at least one field to compare on.'
                send_ga_log('Select Fields Error', flask_session['ga_cid'], label=error)
                status_code = 500
        return render_app_template('select_fields.html', error=error, fields=fields, filename=filename)

@app.route('/training_run/')
def training_run():
    if not flask_session.get('deduper'):
        return redirect(url_for('index'))
    else:
        filename = flask_session['filename']
        return render_app_template('training_run.html', filename=filename)

App error output:

Traceback (most recent call last):
  File "/home/jbutler/fuck2/dedupe-web-master/run_queue.py", line 4, in <module>
    queue_daemon(app)
  File "/home/jbutler/fuck2/dedupe-web-master/queue.py", line 43, in queue_daemon
    msg = redis.blpop(app.config['REDIS_QUEUE_KEY'])
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/redis/client.py", line 1146, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/redis/client.py", line 570, in execute_command
    connection.send_command(*args)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/redis/connection.py", line 556, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/redis/connection.py", line 532, in send_packed_command
    self.connect()
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/redis/connection.py", line 436, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

gunicorn error log :

ERROR:app:Exception on /select_fields/ [POST]
Traceback (most recent call last):
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/jbutler/fuck2/dedupe-web-master/app.py", line 154, in select_fields
    deduper = dedupe.Dedupe(field_defs)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/api.py", line 176, in __init__
    super(DedupeMatching, self).__init__(*args, **kwargs)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/api.py", line 591, in __init__
    self.data_model = DataModel(variable_definition)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/datamodel.py", line 29, in __init__
    field_model = typifyFields(fields)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/datamodel.py", line 91, in typifyFields
    raise TypeError("Incorrect field specification: field "
TypeError: Incorrect field specification: field specifications are dictionaries that must include a type definition, ex. {'field' : 'Phone', type: 'String'}
ERROR:app:Exception on /select_fields/ [POST]
Traceback (most recent call last):
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/jbutler/fuck2/dedupe-web-master/app.py", line 154, in select_fields
    deduper = dedupe.Dedupe(field_defs)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/api.py", line 176, in __init__
    super(DedupeMatching, self).__init__(*args, **kwargs)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/api.py", line 591, in __init__
    self.data_model = DataModel(variable_definition)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/datamodel.py", line 29, in __init__
    field_model = typifyFields(fields)
  File "/home/jbutler/fuck2/dedupe-web-master/deploy_scripts/build/venv/local/lib/python2.7/site-packages/dedupe/datamodel.py", line 91, in typifyFields
    raise TypeError("Incorrect field specification: field "
TypeError: Incorrect field specification: field specifications are dictionaries that must include a type definition, ex. {'field' : 'Phone', type: 'String'}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
bootlear
  • 13
  • 3
  • do you have redis installed? seems not looking at the big Error 111 connecting to localhost:6379. Connection refused. error – Tommaso Barbugli Aug 26 '14 at 18:56
  • Yes, I used apt-get install redis-server. I'm able to connect just fine with redis-cli too to that exact IP/port. I also installed the pip redis. Thank you for your response! – bootlear Aug 26 '14 at 19:23

1 Answers1

2

I'm the guy who is mainly responsible for putting together that code and I think I figured out what's going on. The last time I touched that code was in between the 0.5 and the 0.6 release which means that only some of the API changes were incorporated. I just pushed up a commit last week (on the 27th) that should address the issue.

A couple things to note:

1) The error traceback that you were getting from the "run_queue" process is actually significant and means that, for whatever reason, that process was unable to connect to Redis. I realize that you are able to connect to redis via the command line (according to the comment above) but for whatever reason this app does not seem to be able to. From a Python shell, try something like

>>> from redis import Redis
>>> r = Redis()

That should attempt to connect to Redis on the default port and host (localhost, 6379).

2) Since you're running this locally, you might want to not have to deal with the limit that we set at 10,000 rows for an uploaded spreadsheet. Frankly, this is a rather arbitrary limit that we set for the version that is deployed at dedupe.datamade.us. If you want to remove that limit, you can comment out these lines: https://github.com/datamade/dedupe-web/blob/master/dedupe_utils.py#L57-L60

Let me know if you're still running into issues after pulling the new commit.