Recently, I found a strange behaviour in Azure Synapse Analytics while attempting to tune a store procedure for better performance.
We have a slow performance issue when trying to parse 200 MB of JSON files into a table of Azure Synapse Analytics. I import the JSON string into a table then run stored procedures with OPENJSON on it to parse the data - the table which store the JSON is called json_dataset
.
The JSON itself is considerably nasty in its nature as the data for the destination table might be stored in 80 different locations in the JSON string and it contains various nested JSON elements. Therefore, the store procedure is long (around 5000 lines of T-SQL) and I can not post it here. Under normal circumstance, It takes around 20 mins to finish the stored procedure.
While I was testing with various combinations of distribution table types and index types on json_dataset
table, I figure that the store procedure took only 50 seconds to run (with same data size) after the table json_dataset
is recreated, if I rerun the store procedure again - it takes 20 mins.
Why does Azure Synapse Analytics have this kind of nuance?
How can I make the subsequent runs of the store procedure to have good performance as the first run (without dropping it)?