7

I am trying to optimize my code by reducing the calls to the Database. I have the following models:

class PageCategory(Base):    
    category_id = Column(Text, ForeignKey('category.category_id'), primary_key=True)
    page_id = Column(Text, ForeignKey('page.page_id'), primary_key=True)


class Category(Base):
    category_id = Column(Text, primary_key=True)
    name = Column(Text, nullable=False)
    pages = relationship('Page', secondary='page_category')


class Page(Base):
    page_id = Column(Text, primary_key=True)
    name = Column(Text, nullable=False)
    categories = relationship('Category', secondary='page_category')

The code receives a stream of Facebook likes and each one comes with a Pagea Category and the obvious relation between them a PageCategory. I need to find a way to bulk create, if not existing already, the different Pages, Categories and the relation between them. Given that the code needs to be fast I can't afford a round trip to the Database when creating every object.

page = Page(page_id='1', name='1')
category = Category(category_id='2', name='2')
session.add(page)
session.add(category)
session.commit()
...same for PageCategory

Now, given that a page_id and category_id are PK, the database will raise an IntegrityError if we try to insert duplicates, but that is still a round-trip dance. I would need a utility that receives, say a list of objects like session.bulk_save_objects([page1, page2, category1, category2, page_category1, page_category2]) but just create the objects that do not raise an IntegrityError, and ignore the ones that do.

This way I will be avoiding Database IO for every triple of objects. I don't know if this is possible or this exceeds SQLAlchemy capabilities.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
PepperoniPizza
  • 8,842
  • 9
  • 58
  • 100

0 Answers0