9

I'm wondering if it's possible to prevent committing duplicates to the database. For example, presume there is a class as follows

class Employee(Base):
   id = Column(Integer, primary_key=True)
   name = Column(String)

If I were to make a series of these objects,

employee1 = Employee(name='bob')
employee2 = Employee(name='bob')

session.add_all([employee1, employee2])
session.commit()

I would like only a single row to be added to the database, and employee1 and employee2 to point to the same object in memory (if possible).

Is there functionality within SQLAlchemy to accomplish this? Or would I need to ensure duplicates don't exist programmatically?

Zev Averbach
  • 1,044
  • 1
  • 11
  • 25
nven
  • 1,047
  • 4
  • 13
  • 22
  • It sounds like you have the wrong primary key. You should make the name the primary key if that's what you want. Though then people can't really change names without incurring a huge multi-table database update as all the foreign keys everywhere have to change. Also, what happens if you try to change a name to an existing one? Would you just want to wipe out the row that's already there? – Omnifarious Jun 13 '17 at 02:13
  • 3
    Creating a unique index by name, even if it's not the primary key, looks like the right way to enforce name uniqueness. – 9000 Jun 13 '17 at 04:36
  • To prevent committing duplicates do what @9000 said. To have the session only create bob once in your second example, see the [unique object recipes](https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/UniqueObject). – Ilja Everilä Jun 13 '17 at 06:01
  • Using a unique constraint or primary key would prevent duplicates from being added, but it would do so by preventing the new rows from being committed. I would like the database to resolve the issue rather than just prevent the commit from happening – nven Jun 13 '17 at 11:44
  • 1
    @nven - And how would the database resolve the issue other than by preventing the commit from happening? – Omnifarious Jun 13 '17 at 13:39
  • 1
    Ideally, I would like the ORM to realize that there are duplicate entries, only commit one, and have all references to the duplicates reference the same object – nven Jun 13 '17 at 19:29

3 Answers3

4

An alternate get_or_create() solution:

from sqlalchemy.orm.exc import NoResultFound
# ...

def get_or_create(self, model, **kwargs):
    """
    Usage:
    class Employee(Base):
        __tablename__ = 'employee'
        id = Column(Integer, primary_key=True)
        name = Column(String, unique=True)

    get_or_create(Employee, name='bob')
    """
    instance = get_instance(model, **kwargs)
    if instance is None:
        instance = create_instance(model, **kwargs)
    return instance


def create_instance(model, **kwargs):
    """create instance"""
    try:
        instance = model(**kwargs)
        sess.add(instance)
        sess.flush()
    except Exception as msg:
        mtext = 'model:{}, args:{} => msg:{}'
        log.error(mtext.format(model, kwargs, msg))
        sess.rollback()
        raise(msg)
    return instance


def get_instance(self, model, **kwargs):
    """Return first instance found."""
    try:
        return sess.query(model).filter_by(**kwargs).first()
    except NoResultFound:
        return

Jagoda Gorus
  • 329
  • 4
  • 18
farax
  • 131
  • 7
3

You could create a class method to get or create an Employee -- get it if it exists, otherwise create:

@classmethod
def get_or_create(cls, name):
    exists = db.session.query(Employee.id).filter_by(name=name).scalar() is not None
    if exists:
        return db.session.query(Employee).filter_by(name=name).first()
    return cls(name=name)


employee1 = Employee(name='bob')
db.session.add(employee1)
employee2 = Employee(name='bob')

employee1 == employee2  # False


bob1 = Employee.get_or_create(name='bob')
if bob1 not in db.session:
    db.session.add(bob1)

len(add_to_session) # 1

bob2 = Employee.get_or_create(name='bob')
if bob2 not in db.session:
    db.session.add(bob2)

len(add_to_session) # 1

bob1 == bob2  # True
Zev Averbach
  • 1,044
  • 1
  • 11
  • 25
  • 2
    credit to https://stackoverflow.com/a/32952421/4386191 for the quick `exists` expression. – Zev Averbach Jun 13 '17 at 03:25
  • This is interesting. I was hoping for a way to resolve this on commit. I presume bob1 would need to first be added to the session before `get_or_create` would return the `bob1` object to the `bob2 = Employee.get_or_create(name='bob')` call? In the current workflow of my package, I first generate a large list of these types of objects and am not interacting with the database at all until commit – nven Jun 13 '17 at 03:34
  • @nven oops, you're right that it has to be added to the session. I added this to the example and showed a pattern that might work for you. No commit needed, just add objects to the session after checking for uniqueness. – Zev Averbach Jun 13 '17 at 04:34
2

There are at least 2 approaches:

  • The database approach: create a relevant primary key; with SQLAlchemy you would define e.g. based on your minimalistic example name = Column('First Name', String(20), primary_key=True)
  • The coding approach: check whether the attribute, set of attributes already exists in the table, otherwise create it. See relevant code examples here.

In terms of performance, I believe the database approach is better. It also is the one which makes more sense.

TechSolomon
  • 425
  • 5
  • 12
tagoma
  • 3,896
  • 4
  • 38
  • 57
  • can't edit the answer above, so will add the updated link here: the article that @tagoma has provided above has moved to: https://github.com/sqlalchemy/sqlalchemy/wiki/UniqueObject – Babak K Feb 25 '22 at 17:23