Here is the scenario: we are using Azure Synapse analytics with 100 DTU, we have several tables. All tables are round robin distributed and clustered columnstore because data is loaded once in a day and we want it to be quick. All these tables are big having 50+ columns and millions of records (around 30m) There is one load_table table which contains information about last load of all the tables.
SQL Query1:
SELECT load_time FROM load_table WHERE table = 't2';
when I execute this query on table, this gives me one timestamp as output for that table. format is like this "YYYY-MM-DD TT". This query normally runs fine within 5-10 secs.
SQL Query2:
SELECT top 100 * FROM t2;
t2 is big fact table containing millions of records having 50+ cols. When I run this query it takes 2-3 mins.
Problem occurs when I join these two tables and it takes 2-3 hrs to execute. I want the result like loadtime as first column and all columns from second fact table.
joining is like below
SELECT t1.load_time, t2.* FROM
(SELECT load_time FROM load_table WHERE table = 't2') as t1 INNER JOIN t2
on 1 = 1;
How can I achieve this result another way?
I have checked statistics of fact table those are updated I have checked skew and it is fine