1

I have a flask project that relies on flask-sqlalchemy and celery to do a lot of things. Many of the celery tasks reach out to external API's, fetch some data, and read/update data on disk. When expanding my number of tasks that are being run, I see a lot of 40001 errors such as

(psycopg2.errors.SerializationFailure) could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.

And other similar errors, all relating to transactions failing. We are using serializable transactions in SQLAlchemy.

I've tried to rerun the tasks using the builtin celery auto-retry features, but I think an architectural change needs to be made, since the tasks are still failing, even with several retries. The reading from remote APIs and storing to database logic is mingled in the tasks and my first idea was to separate this logic out as much as possible, first reading from database and API, then sending another task that would make changes to API/database, but I'm not sure this would help. I don't want to wrap every single call to db.session.query(Object) or similar in try/catch clauses if there's a better way?

Code looks like this (heavily redacted)

@celery_app.task(
  bind=True,
  autoretry_for=(
      HTTPError,
      sqlalchemy.exc.OperationalError,
      sqlalchemy.exc.IntegrityError,
  ),
  retry_backoff=5,
  retry_jitter=True,
  retry_kwargs={"max_retries": 5},
)
def sync_db_objects_task(self, instance_id: int):
  instance = ObjectInstance.query.get(instance_id)
  if not instance:
      return

  # Run a query fetching objects of interest
  latest_local_tickets = db.session.query(..)
  
  if all([t.deleted for t in latest_local_tickets]):
      return

  remote_config = (instance.config.address, instance.config.port, instance.config.username, instance.config.password)

  # determine which ticket objects are out of sync:
  out_of_sync_tickets = out_of_sync_tickets(latest_local_tickets, remote_config)
  successes, fails = [],[]
  for result in out_of_sync_tickets:
      db_object = latest_local_tickets[result.name]
      try:
         #Sync
         #connect, update
         status, reason = connect.sync(db_object)
         successes.append(db_object.name)
      except HTTPError as e:
          status, reason = e.status_code, e.msg
          fails.append(db_object.name)
      db_object.update_state(status, reason)
      db.session.add(db_object)
      db.session.commit()
  logger.info(f"successes: {successes}")
  logger.info(f"fails:{fails}")

It usually fails in the call to update_state which looks like this:

@declarative_mixin
class ReplicationMixin:


  status = Column(types.Text, default="unknown", nullable=False)
  status_reason = Column(types.Text, nullable=True)
  last_sync_date = Column(types.DateTime, default=None)

def update_state(status:str, reason:str=None):
  
  self.status = status if status else "fail"
  self.status_reason = parse_reason(reason)
  self.last_sync_date = datetime.utcnow()  

Like this:

[2022-07-29 09:22:11,305: ERROR/ForkPoolWorker-8] (psycopg2.errors.SerializationFailure) could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during write.
HINT:  The transaction might succeed if retried.

[SQL: UPDATE tickets SET updated=%(updated)s, last_sync_date=%(last_sync_date)s WHERE tickets.id = %(tickets_id)s]
[parameters: {'updated': datetime.datetime(2022, 7, 29, 9, 22, 11, 304293), 'last_sync_date': datetime.datetime(2022, 7, 29, 9, 22, 11, 303875), 'tickets_id': 124}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

But i think it's a red herring, as its probably being subsequently read somewhere else.

Can I attempt to leverage db.session.refresh(object) or something else to make sure I commit and re-read data before changing it, or is there some other way to alleviate this issue?

I've been stumped for a week on this issue and I cant seem to figure out what the best way forward is. Very appreciative for any help I can get.

enrm
  • 645
  • 1
  • 8
  • 22

0 Answers0