How do i prevent duplicate records on rethinkdb using python

Question

I want to only insert records on to a table if there arent previous records like for example i have a table users and i want to only add unique users with phone numbers

{"name":"john smith", "Age":30 , "phone_number": "556"}

if another user with the same phone number were to be added i would't want to be able to add them. For now i am trying to use filter to figure if i have existing records with that phonenumber

r.db(DB).table('users').filter(
    {'phone_number': new_practice['phone_number']},       
    default=r.error()).run(conn)

that is how i tried to use that method to find if the user existed but that didn't seem to give me what i wanted

Roy Hvaara · Accepted Answer · 2015-11-06T07:00:24.357

There is no good way to do this safely with RethinkDB. This has been discussed in an issue on github, and has to do with unique secondary indexes as well as guaranteeing atomaticity. In essence you have to do two operations. First search to see if the phone number is already present, and then add the record if it isn't.

RethinkDB does not support unique secondary indexes even for non-sharded tables. (Source: Using secondary indexes in RethinkDB)

So we have a race-condition. Two requests could be run at the same time both getting back that the phone number is not present, and then both end up inserting the new record. The following unsafe code will do this:

new_record = {"name": "john smith", "Age": 30, "phone_number": "556"}
filter_predicate = {"phone_number": new_record["phone_number"]}
test = r.table('users').filter(filter_predicate).count().eq(0)
r.branch(test, r.table('users').insert(new_record), None).run()

And here's the output from running it:

In [92]: for document in r.table('users').run(): print(document)

In [93]: new_record = {"name": "john smith", "Age": 30, "phone_number": "556"}

In [94]: filter_predicate = {"phone_number": new_record["phone_number"]}

In [95]: test = r.table('users').filter(filter_predicate).count().eq(0)

In [96]: r.branch(test, r.table('users').insert(new_record), None).run()
Out[96]: 
{u'deleted': 0,
 u'errors': 0,
 u'generated_keys': [u'2c0ffb27-cfdb-44e7-a416-4f7be5d97ea9'],
 u'inserted': 1,
 u'replaced': 0,
 u'skipped': 0,
 u'unchanged': 0}

In [97]: for document in r.table('users').run(): print document
{u'phone_number': u'556', u'Age': 30, u'id': u'2c0ffb27-cfdb-44e7-a416-4f7be5d97ea9', u'name': u'john smith'}

In [98]: r.branch(test, r.table('users').insert(new_record), None).run()

In [99]: for document in r.table('users').run(): print document
{u'phone_number': u'556', u'Age': 30, u'id': u'2c0ffb27-cfdb-44e7-a416-4f7be5d97ea9', u'name': u'john smith'}

One work-around that comes to mind is to try and fail gracefully. That is, we do the initial check to see if the document is there, if it's not we add it. Then we have to check for duplicates. And clean up if duplicates are found. We need some sort of tie-breaker, for example by the time the document was inserted (earliest is the winner). Any queries that expects our documents to have unique phone numbers also needs to take this into account. In case of two documents with the same phone number order by time ascending and select the first one.

How do i prevent duplicate records on rethinkdb using python

1 Answers1