0

I am trying to insert some data into some stage tables where the insert is taking far too long. For example a table containing up 600000 records is taking nearly an hour to complete. In the select part of the query we are are creating a hash of the columns which is later used for change detection. Due to the Data Warehouse methodology we are using, DataVault, we cannot remove the change hash. When I look at the execution plan I see warnings of type conversion in expression. and example is shown below.

<Warnings>
              <PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT(varchar(max),[load].[load_acbs_loan_product_dimension].[product_month],0)" />

I have also included a example of an insert statement that we are using:

      INSERT INTO [stage_acbs_balance_category_dimension]
      ( hk_h_balance_category_dimension
      , balance_category_key
      , balance_category_code
      , balance_category_description
      , balance_class_code
      , balance_class_description
      , user_define_code_1
      , user_define_code_2
      , user_define_code_3
      , user_define_code_4
      , user_define_code_5
      , bal_cat_short_name
      , include_bal_in_tax_reporting
      , include_in_billings_statements
      , include_in_past_due_reporting
      , dss_change_hash_acbs_balance_category_dimension_lroc
      , dss_record_source
      , dss_load_date
      , dss_create_time)
      SELECT  CAST(HASHBYTES('sha2_256',
               COALESCE(CAST(load_acbs_balance_category_dimension.balance_category_key AS VARCHAR(MAX)),'null')
               ) AS BINARY(32)) AS hk_h_balance_category_dimension 
           , load_acbs_balance_category_dimension.balance_category_key AS balance_category_key 
           , load_acbs_balance_category_dimension.balance_category_code AS balance_category_code 
           , load_acbs_balance_category_dimension.balance_category_description AS balance_category_description 
           , load_acbs_balance_category_dimension.balance_class_code AS balance_class_code 
           , load_acbs_balance_category_dimension.balance_class_description AS balance_class_description 
           , load_acbs_balance_category_dimension.user_define_code_1 AS user_define_code_1 
           , load_acbs_balance_category_dimension.user_define_code_2 AS user_define_code_2 
           , load_acbs_balance_category_dimension.user_define_code_3 AS user_define_code_3 
           , load_acbs_balance_category_dimension.user_define_code_4 AS user_define_code_4 
           , load_acbs_balance_category_dimension.user_define_code_5 AS user_define_code_5 
           , load_acbs_balance_category_dimension.bal_cat_short_name AS bal_cat_short_name 
           , load_acbs_balance_category_dimension.include_bal_in_tax_reporting AS include_bal_in_tax_reporting 
           , load_acbs_balance_category_dimension.include_in_billings_statements AS include_in_billings_statements 
           , load_acbs_balance_category_dimension.include_in_past_due_reporting AS include_in_past_due_reporting 
           , CAST(HASHBYTES('SHA2_256',
               COALESCE(CAST(load_acbs_balance_category_dimension.balance_category_code AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.balance_category_description AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.balance_class_code AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.balance_class_description AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_1 AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_2 AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_3 AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_4 AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.user_define_code_5 AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.bal_cat_short_name AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.include_bal_in_tax_reporting AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.include_in_billings_statements AS VARCHAR(MAX)),'null') +'||'+
               COALESCE(CAST(load_acbs_balance_category_dimension.include_in_past_due_reporting AS VARCHAR(MAX)),'null')
               ) AS BINARY(32)) AS dss_change_hash_acbs_balance_category_dimension_lroc 
           , load_acbs_balance_category_dimension.dss_record_source AS dss_record_source 
           , load_acbs_balance_category_dimension.dss_load_date AS dss_load_date 
           , getdate() AS dss_create_time 
      FROM [load_acbs_balance_category_dimension] load_acbs_balance_category_dimension

I am looking for a way to get rid of the implicit conversions and get the insert to perform. I initially thought of making the column a computed persisted hash but the hash is non-deterministic. Any ideas would be much appreciated.

Eseosa Omoregie
  • 175
  • 2
  • 11
  • `product_month` isn't even mentioned in your sample query, so your time is probably not getting wasted in implicit conversions. SHA2-256 isn't exactly cheap, typically performing 64 rounds of bit shuffling over the input data before producing the resultant hash. Have you considered precomputing the SHA2-256 hashes and including them as columns in your import data file? – AlwaysLearning Sep 13 '21 at 10:07
  • hi there, I just used product_month as an example of the warnings for the execution plan.. There are 7 inserts in total which we are doing. The insert including product month was too large to put in the forum so i chose a smaller query as an example. The data is coming from an oracle database. Do you mean pre computing the sha2 -256 hashes and storing them in staging? – Eseosa Omoregie Sep 13 '21 at 10:16
  • 1
    Why dont you just use `rowversion` for change detection? Or at the very least switch to `MD5` for better performance – Charlieface Sep 13 '21 at 11:44
  • Wherescape is being used to generate the data Vault and so hashing has to be used by default. I switched varchar(max) to varbinary(max) and got significant performance improvement. However indeed using another Hash method will probably get better performance – Eseosa Omoregie Sep 14 '21 at 08:17

1 Answers1

0

I opted to change VARCHAR(MAX) TO VARBINARY(MAX which gave a significant boost in performance, 1 hour down to 40 seconds. I am also looking to change the hashing algorithm.

Eseosa Omoregie
  • 175
  • 2
  • 11
  • concatenating multiple MAX fields can lead to implicit conversion issues. Changing those conversions to a more appropriate length may also increase performance. You can also try CONCAT_WS to pass all the fields in as an array and convert to a character value, but that will ignore nulls. – Wes H Sep 29 '21 at 15:21