I have the use case which needs to update the same rows in a postgresql table from four spark jobs at the same time. The spark jobs are using 'executeUpdate()' from JDBC to update the same table. The spark jobs are streaming jobs, which needs to store data continously.
I am thinking to rely on Row based Lock or Table based Lock to solve the deadlock issues. Is my following assumption correct?
Row based Lock
In each spark job, I add 'BEGIN' and 'COMMIT' around the sql statement to run in executeUpdate. Then, I add 'SELECT xxx FOR UPDATE' to explicitly lock the rows for impacted rows in each spark job. In my understanding, the next spark job will automatically wait the Row based lock to be released to execute its Update statement for the impacted rows from the previous job.
Table based Lock
In each spark job, I added 'BEGIN', 'LOCK TABLE table', 'COMMIT' around the sql statement to run in executeUpdate. In my understanding, the explicit 'LOCK' command will lock the whole table during its transaction. When the second spark job tries to lock the table, it will wait until the first spark job to release the lock.
Are there above approaches working for the deadlock issues? Since spark jobs are streaming jobs, I prefer to the row based one, which can save waiting time. Are there some better solutions? Thanks!