I have a use case where some of our redshift tables are used by multiple data scientists at the same time for tuning. If they are tuning at the same time and import data at the same time, we end up with missing rows and duplicated rows.
Awhile ago, they had removed the explicit lock from the table to reduce the amount of loads that would hang on those locks.
I'm assuming that this removal of the explicit lock is causing the duplicated and missing rows.
In terms of path forward, I was thinking about having kinesis, redis, or something similar to batch these to be one import instead of inserts (not great for redshift anyways). Or if the real solution is to add explicit locking back to the table and deal with the hanging loads.
Any guidance would be appreciated. Thanks
Putting the explicit locks back on works, but other procedures hang behind the table/proc locks and slows down significantly.