I have two tables(let's say Table A and B) to store the status of two independent activity respectively. I have two cron jobs(Lets call it A and B again) which run 15 mins apart, every 30 mins. They both have the same logic, Updating the data in the table, the only difference is the table they work on.
Let me explain Cron A:
Cron A picks up users with "pending" status in Table A, performs some logic to filter out the now "active" users out of these. It then queries data from table B for these active users only. It then begins to update the status of users in Table A from pending to active, checks if this user is active in table B as well(from the data queried in table B) and stores such users in a Set(active in A and B) for further processing.
This DB update to table A happens for one record at a time and not Bulk update hence it consumes a lot of time.
Note: the users present in table A are also present in Table B.
Similar logic applies to cron B as well where it considers the other table.
You can see the problem. If there are too many records to update and, for example, say cron A is still running and cron B starts it's processing, Cron B will query users from cron A who are still being updated by cron A, i.e, cron B will end up having stale data.
I have one approach to solve this but wanted to know better or more practised solutions to such issues. One thing is, we currently don't have time or resource to completely optimize the legacy code in these cron jobs to improve its processing time.
I was thinking of having a redis key which will be set to True when one of the cron jobs begins to update its respective table's data. While the redis key is set, if it's time for the other cron job, it will first check the value of this redis key and if it is True it should stop processing for that schedule.
The same redis key logic goes for the other cron job as well.
There is not much impact in terms of TAT since the next cron will run in 30 mins. Moreover, I can setup an Elastalert saying the cron job had to be terminated and anyone can monitor and trigger the job manually once the other job is completed.
Wanted to check if this is a viable approach