0

I have 7 reports which are downloaded daily at late night.

These reports can be downloaded in csv/xml. I am downloading them csv format as they are memory efficient.

This process runs in background and is managed by hangfire.

After they are downloaded, I am using dapper to run a stored procedure which insert/update/update data using merge statements. This stored procedure has seven table value parameters.

Instead of delete, I am updating that record's IsActive column to false.

Note that 2 reports have more than 1 million records.

I am getting timeout exceptions only in Azure SQL. In SQL Server, it works fine. As a workaround, I have increased timeouts to 1000 for this query.

This app is running in Azure s2.

I have pondered over the option of sending xml but I have found SQL Server is slow at processing xml which counter productive.

I can not also use SqlBulkCopy as I have to update based on some conditions.

Also note that more reports will be added in future.

Also when a new report is added then there are large amount of inserts. If previously added report is ran again then majority updates are run.

These tables currently do not have any indexes, only clustered integer primary key.

Each row has a unique code. This code is used to identify whether to insert/update/delete

Can you recommend a way to increase performance?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Khalil
  • 1,047
  • 4
  • 17
  • 34
  • Without knowing more, it's hard to tell what will work best, but I'd start with: Adding an index on your unique code column, followed by using bulk copy to insert the new data into temporary a table, followed by a merge into your regular table – ESG May 26 '19 at 17:05
  • yes, I am trying the staging table option to check if it increases performance. – Khalil May 26 '19 at 18:00

1 Answers1

0

Is your source inputting the whole data? Whether they are updated/new. I assume by saying the unique code(insert/update/delete) you are only considering changes (Delta). If not that's one area. Another is to consider parallelism. I think then you need to have different stored procedures for each table. Non dependent tables could be processed together

Tony Thomas
  • 398
  • 7
  • 20
  • That is an option but I think batching will result in better performance. Sql server internally already does parallelism as long as there is no table lock involved. – Khalil May 26 '19 at 18:07
  • You mean sequential statements to process different tables in a single stored procedure is executed in parallel? – Tony Thomas May 26 '19 at 19:30