I'm developing a parcels tracking system and thinking of how to improve it's performance.
Right now we have one table in postgres named parcels
containing things like id
, last known position, etc.
Everyday about 300.000 new parcels are added to this table. The parcels data is took from external API. We need to track all parcels positions as accurate as possible and reduce time between API calls about specific parcel.
Given such requirements what could you suggest about project architecture?
Right now the only solution I can think of is producer-consumer pattern. Like having one process selecting all records from parcel
table in the infinite loop and then distribute fetching data task with something like Celery.
Majors downsides of this solution are:
- possible deadlocks, as fetching data about the same task can be executed at the same time on different machines.
- need in control of queue size