Why does recreate table make query for the table faster in Azure Synapse Analytics?

Question

Recently, I found a strange behaviour in Azure Synapse Analytics while attempting to tune a store procedure for better performance.

We have a slow performance issue when trying to parse 200 MB of JSON files into a table of Azure Synapse Analytics. I import the JSON string into a table then run stored procedures with OPENJSON on it to parse the data - the table which store the JSON is called json_dataset.

The JSON itself is considerably nasty in its nature as the data for the destination table might be stored in 80 different locations in the JSON string and it contains various nested JSON elements. Therefore, the store procedure is long (around 5000 lines of T-SQL) and I can not post it here. Under normal circumstance, It takes around 20 mins to finish the stored procedure.

While I was testing with various combinations of distribution table types and index types on json_dataset table, I figure that the store procedure took only 50 seconds to run (with same data size) after the table json_dataset is recreated, if I rerun the store procedure again - it takes 20 mins.

Why does Azure Synapse Analytics have this kind of nuance?

How can I make the subsequent runs of the store procedure to have good performance as the first run (without dropping it)?

This might possibly be a parameter sniffing / cached execution plan issue, recreating the table would invalidate any cached plan so it would be recompiled on next execution. — Stu, Jun 22 '22 at 11:27

score 0 · Answer 1 · answered Jul 05 '22 at 04:36

You need to enable Result Set Caching.

Use below syntax:

SET RESULT_SET_CACHING { ON | OFF };

When result set caching is enabled, dedicated SQL pool automatically caches query results in the user database for repetitive use. This allows subsequent query executions to get results directly from the persisted cache so recomputation is not needed. Result set caching improves query performance and reduces compute resource usage. In addition, queries using cached results set do not use any concurrency slots and thus do not count against existing concurrency limits. For security, users can only access the cached results if they have the same data access permissions as the users creating the cached results. Result set caching is OFF by default at the database and session levels.

For more details, refer Microsoft official document Performance tuning with result set caching .

Why does recreate table make query for the table faster in Azure Synapse Analytics?

1 Answers1