3

I've got a Talend job running with a couple of dataflows running in parallel against a Snowflake database. An update statement against Table A is causing an update on Table B to fail with the following error:

Transaction 'uuid-of-transaction', id 'a-very-long-integer-id', is being committed, SQL execution canceled.

Call END_OPERATION(999,'String1','String2','String3','String4','Success','0')

UPDATE TableB SET BATCH_KEY =  1234, LOAD_DT = current_timestamp::timestamp_ntz, KEY_HASH = MD5(TO_VARCHAR(ARRAY_CONSTRUCT(col1))), ROW_HASH = MD5(TO_VARCHAR(ARRAY_CONSTRUCT(col2, col3))) WHERE BATCH_KEY = -1 OR BATCH_KEY IS NULL;

The code for END_OPERATION is here:

var cmd = 
 "CALL END_OPERATION(:1,:2,:3,:4,:5,:6,null);";
    try {
    
    snowflake.execute (
        {sqlText: cmd,binds: [BATCH_KEY,ENTITY,LAYER,SRC,OPERATION,OPERATION_STATUS].map(function(param){return param === undefined ? null : param})},
        );
    return "Succeeded.";  
    
    }
catch (err)  {
    return "Failed: " + err;   
    }


var cmd = 
 "UPDATE TableA SET OPERATION_STATUS=:6,END_DT=current_timestamp,ROW_COUNT=IFNULL(:7,ROW_COUNT) WHERE BATCH_KEY=:1 AND ENTITY_NAME=:2 AND LAYER_NAME=:3 AND SRC=:4 AND OPERATION_NAME=:5";
    try {
    
    snowflake.execute (
        {sqlText: cmd,binds: [BATCH_KEY,ENTITY,LAYER,SRC,OPERATION,OPERATION_STATUS,ROW_COUNT].map(function(param){return param === undefined ? null : param})},
        );
    return "Succeeded.";  
    
    }
catch (err)  {
    return "Failed: " + err;   
    }

I'm failing to understand why the UPDATE statement against TableB is getting killed. It's getting killed nearly immediately.

Nick Heidke
  • 2,787
  • 2
  • 34
  • 58

1 Answers1

2

Here we need to review the flow of all SQL statements coming from the Talend job within the same session in which the failing SQL command is run as well as all the statements coming from the other parallel job.

From the Query History we can get the SessionID of the session. From the History section of the Snowflake UI we can make a search based upon the SessionID. This will list all the commands run through this particular session. We can review all the commands in their chronological order by sorting over the start_date column and try to observe the sequence of SQL statements.

Your point is indeed valid that an update on TableA should not affect an Update on TableB but after reviewing all the statements of both the sessions (we read that the Talend job is running a couple of dataflows in parallel) we may come across some SQL statement in one session which has taken a lock on tableB before the Update command is submitted against it from the other session.

Another thing which can be reviewed here is how the transaction is managed by the workflow. Within the same list of SQL queries in that session we need to check for any statements which sets the parameter Autocommit at the session level. If Autocommit it set to FALSE at the start of the session then the session will not release any of the table locks until an explicit commit is submitted.

Since the situation here sounds a bit unusual and complex, we may have to dig a little more deeper to review the execution logs of both the queries and for that we may have to contact the Snowflake support.

Haseeb
  • 196
  • 1
  • That gives me a good place to start from, I'll be digging in more. I know Talend is setting Autocommit to false in some cases, so that's likely the culprit. – Nick Heidke Jul 16 '20 at 15:56
  • You nailed it Haseeb. Talend was setting autocommit to false on a session, then running two update statements back to back, the second getting cancelled. To resolve this, I had each Talend component create their own session for individual update statements. – Nick Heidke Jul 23 '20 at 16:26
  • Thanks for the confirmation Nick. I am glad that the pointers have helped. – Haseeb Jul 25 '20 at 01:13