Redshift removal of explicit locks lead to missing rows/duplicated rows. Best resolution?

Question

I have a use case where some of our redshift tables are used by multiple data scientists at the same time for tuning. If they are tuning at the same time and import data at the same time, we end up with missing rows and duplicated rows.

Awhile ago, they had removed the explicit lock from the table to reduce the amount of loads that would hang on those locks.

I'm assuming that this removal of the explicit lock is causing the duplicated and missing rows.

In terms of path forward, I was thinking about having kinesis, redis, or something similar to batch these to be one import instead of inserts (not great for redshift anyways). Or if the real solution is to add explicit locking back to the table and deal with the hanging loads.

Any guidance would be appreciated. Thanks

Putting the explicit locks back on works, but other procedures hang behind the table/proc locks and slows down significantly.

score 0 · Answer 1 · answered Jan 19 '23 at 17:48

0

Yes removing locks is causing this and they shouldn’t do it. Tell them to stop.

They are likely running into this because they aren’t COMMITting their changes. Changing their connections to AUTOCOMMIT might fix things. If one person changes a table but doesn’t COMMIT the change then they have a local copy and a lock until they do. If they never disconnect then this situation can last forever. If many are doing this then you have a mess of local copies and locks waiting to be resolved but nobody COMMITs.

When people come from lock-on-write databases they can get confused about what is happening. Read up on MVCC database coherency.

answered Jan 19 '23 at 17:48

Bill Weiner

8,835
2
7
18

Makes sense. So then to accommodate them being able to upload simultaneously while maintaining best practices with LOCKS, does it make the most sense to have something like kinesis to batch these imports together? They should be able to tune at the same time, but want to make sure we are maintaining the necc locks on the table. – jimhens Jan 19 '23 at 19:04
Shouldn’t need to. Redshift coherency will keep track of the changes as they happen and make sure the right things happen in the end. The locks are there to prevent non-ACID actions from occurring. As long as they commit their changes as they go things should be seamless. – Bill Weiner Jan 19 '23 at 19:42

Redshift removal of explicit locks lead to missing rows/duplicated rows. Best resolution?

1 Answers1