0

Some of my tables are of type REPLICATE. I would these tables to be actually replicated (not pending) before I start querying my data. This will help me avoid data movement.

I have a script, which I found online, which runs in a loop and do a SELECT TOP 1 on all the tables which are set for replication, but sometimes the script runs for hours. It may seem as the server sometimes won't trigger replication even if you do a SELECT TOP 1 from foo.

How can you force SQL Datawarehouse to complete replication?

The script looks something like this:

begin

CREATE TABLE #tbl
WITH
( DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT 
       ROW_NUMBER() OVER(
       ORDER BY
                (
                    SELECT 
                           NULL
                )) AS                                            Sequence
     , CONCAT('SELECT TOP(1) * FROM ', s.name, '.', t.[name]) AS sql_code
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
         ON t.object_id = p.object_id
JOIN sys.schemas AS s
         ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady';

DECLARE @nbr_statements INT=
        (
            SELECT 
                   COUNT(*)
            FROM #tbl
        ), @i INT= 1;

    WHILE @i <= @nbr_statements

        BEGIN
        DECLARE @sql_code NVARCHAR(4000)= (SELECT 
        sql_code
        FROM #tbl
        WHERE Sequence = @i);
        EXEC sp_executesql @sql_code;
        SET @i+=1;
        END;
        DROP TABLE #tbl;
        SET @i = 0;
        WHILE
              (
                  SELECT TOP (1) 
                         p.[state]
                  FROM sys.pdw_replicated_table_cache_state AS p
                  JOIN sys.tables AS t
                           ON t.object_id = p.object_id
                  JOIN sys.schemas AS s
                           ON t.schema_id = s.schema_id
                  WHERE p.[state] = 'NotReady'
              ) = 'NotReady'
            BEGIN
                IF @i % 100 = 0
                    BEGIN
                        RAISERROR('Replication in progress' , 0, 0) WITH NOWAIT;
                    END;
                SET @i = @i + 1;
            END;

END
Henrik F
  • 1
  • 1

2 Answers2

1

Henrik, if 'select top 1' doesn't trigger a replicated table build, then that would be a defect. Please file a support ticket.

Without looking at your system, it is impossible to know exactly what is going on. Here are a couple of things that could be in factoring into extended build time to look into:

  • The replicated tables are large (size, not necessarily rows) requiring long build times.
  • There are a lot of secondary indexes on the replicated table requiring long build times.
  • Replicated table builds require statirc20 (2 concurrency slots). If the concurrency slots are not available, the build will queue behind other running queries.
  • The replicated tables are constantly being modified with inserts, updates and deletes. Modifications require the table to be built again.
  • Great insight, thanks. Replicated tables are not large. No secondary indexes. Runs under largerc. All changes are done when I run this script. Next step is to populate fact tables. What is wierd, is that sometimes this script run for hours and doesn’t stop until instance is beint scaled down. – Henrik F May 11 '19 at 06:13
0

The best way is to run a command like this as part of the job which creates/updates the table:

select top 1 * from <table>

That will force its redistribution at the correct time, without the slow loop through the stored procedure.

Ron Dunn
  • 2,971
  • 20
  • 27
  • Doing a select top 1 doesn't have such an effect. Select top 1 is triggering the table for a scheduled replication. It's not synchronous in that sense. You have no guarantees for when replication takes place, hence, that is what the script does, it waits for the tables to be replicated. – Henrik F May 07 '19 at 15:37