1

We are a small family run business in Miami and currently use a few products in the Azure Cloud. We have a standard Azure SQL database but the performance is poor.

We run a daily report over a transactional table of 50 million rows but this can take HOURS to complete. We increased the DTUs to 100 which made a huge improvement but it is still too slow. We increased this again to 200 DTUs which is even better now but still slower than we would like. Increasing the DTUs to 400 seems to be the magic number as the report returns in a reasonable amount of time.

To get reasonable performance from the server the pricing tier needs to be at 400 DTUs. This costs $588 a month! That is an insane cost for a single hosted database. Even if we scale up the DTUs in a morning and down again in the evening it is still expensive.

The data cannot be optimized any more than it already has.

Would there be a better solution to hosting this data?

TheCloudy
  • 11
  • 2
  • Instead of adding iron, fix the slow query. Generating reports over 50M rows sounds .... inefficient. Whose going to read all 50M rows? *Aggregating* them can be accelerated immensely by using the proper structures and indexes, eg columnstore indexes. Power BI is the dedicated reporting tool. You don't need to push all 50M rows every day to Power BI, just the new data – Panagiotis Kanavos Feb 03 '22 at 14:36
  • @PanagiotisKanavos The report only brings back what is needed. Indexes are applied where necessary. The query is aggregating some values. I don't think this is a query problem, is an engine bottleneck caused by low DTUs. – TheCloudy Feb 03 '22 at 14:53
  • That wouldn't be a problem if the query didn't try to recalculate stuff that could be precalculated. It's not how much data you return, it's how much data you need to read. If you have to read 50M rows every time, you're doing it wrong. That's what Power BI is for by the way - quickly generate reports from lots of data *without* reloading everything every time. – Panagiotis Kanavos Feb 03 '22 at 15:03
  • In any caser this question is already out of scope. You're asking for hosting recommendations. SO is for asking how to improve the report itself and the queries, and use *a better schema* suitable for reporting. Your reporting was already inefficient but you didn't notice because you couldn't see the cost of the extra time it took to run when you run the database locally. 50M rows isn't a lot of data for a properly designed star schema. Columnstore indexes would improve performance 100x and reduce size 2-3x, especially over a star schema – Panagiotis Kanavos Feb 03 '22 at 15:05
  • Azure Data Lake is another option for *reporting*. Instead of designing a proper reporting database up front, put the data in a semistructured form in a data lake and process it on the fly. You pay for storage and analytics usage only. – Panagiotis Kanavos Feb 03 '22 at 15:11
  • No matter what reporting solution is used, every day only the new data should be sent to the reporting database, not all 50M rows. That's how data warehouses in SQL Server worked since 2000. This results in far less IO and CPU usage over time – Panagiotis Kanavos Feb 03 '22 at 15:15
  • Are you using columnstores? are the rowgroups compressed properly? – Conor Cunningham MSFT Feb 04 '22 at 13:26

1 Answers1

0

You could store the data in ADLS (preferably in parquet format) and create external tables on Azure Synapse Serverless pool.This solution is relevant only when you want to create report and not meant for any Transactional requirements.

Despicable me
  • 548
  • 1
  • 9
  • 24