1

We have list of 50 tables in "SQL Server-A" in "PC-A" and same 50 tables in "SQL server-B" in "PC-B" both are connected together in network.

"SQL server-A" acts as staging since it is directly connected with sensors. A python Program tries to transfers all the data from 50 tables in "SQL Server-A" to "SQL Server-B" every hour ("every hour, last 2 hours data will be transfered in incremental way"). While transfering, how to avoid the row insertion in "SQL Server-B" tables if the row already exist in the tables?

Earlier we used PostgreSQL in place "server-B". We used bulk transfer 2 hour data to PostgreSQL "Temp" table and then from "Temp" table we insert to actual table with query "on Conflict do nothing". What is its equivalent in SQL Server?

For example: table in "Server-A" and DB : DB-1 & Table: Table-1 [here "ID" Column is a primary key]

ID Name Value       Timestamp 
1   X    12    2022-02-14 09:46:24.840 
2   Y    15    2022-02-14 09:47:24.840
3   A    35    2022-02-14 09:48:24.840
4   B    56    2022-02-14 09:49:24.840
5   C    86    2022-02-14 09:50:24.840

Table in "Server-B" and DB : DB-1 & Table: Table-1

ID Name Value       Timestamp 
1   X    12    2022-02-14 09:46:24.840 
2   Y    15    2022-02-14 09:47:24.840
3   A    35    2022-02-14 09:48:24.840

Need to insert only the ID 4 & 5 in the Table-1 of DB-1 in server-B and skip ID 1,2 & 3. What is the best way to transfer this in SQL, since tables in Server-B hold millions of rows?

halfer
  • 19,824
  • 17
  • 99
  • 186
Nandan
  • 91
  • 8
  • 1
    Use an `EXISTS` and filter the data out that you don't want to `INSERT`/`UPDATE`. – Thom A Mar 17 '22 at 10:49
  • destination table is huge. is it the best way to do bulk transfer all 50 tables every hour? – Nandan Mar 17 '22 at 10:56
  • 1
    Define "huge", @Nandan . SQL Server can trivially handle millions of rows in a single statement. Provided your Primary key columns are properly indexed, an `EXISTS` will be very performance. If you want to **replicate** the data, however, then you should be looking at mirroring/replication in my opinion. – Thom A Mar 17 '22 at 11:01
  • Each of the 50 table has more than 15 lakh rows and it will grow further on time. if i check the EXISTS of Primary key. will i be able to enter 5000 rows every hour for all the 50 tables ? – Nandan Mar 17 '22 at 11:24
  • I don't know what a "Lakh" is, but unless it means 10million or more then again, SQL Server will have no problems handling that few rows. – Thom A Mar 17 '22 at 11:32
  • 1
    (I could have sworn i reopened this after i closed it. Coffee hadn't set in enough to remind me I would gold badge it when i made the suggestion. Apologies. Time for a 2nd coffee) – Thom A Mar 17 '22 at 11:43
  • 1
    @Larnu - "Lakh" is a term from India. It means 10 ** 5, or 100_000. – Gord Thompson Mar 17 '22 at 13:40

1 Answers1

0

Earlier we used PostgreSQL in place "server-B". We used bulk transfer 2 hour data to PostgreSQL "Temp" table and then from "Temp" table we insert to actual table with query "on Conflict do nothing". What is its equivalent in SQL Server?

For a PostgreSQL INSERT with ON CONFLICT DO NOTHING like

    INSERT INTO tbl_main (id, txt)
    SELECT id, txt FROM tbl_temp
    ON CONFLICT DO NOTHING

(single-column PK on id) the equivalent in T-SQL would be

    INSERT INTO tbl_main (id, txt)
    SELECT id, txt FROM tbl_temp
    WHERE id NOT IN (SELECT id FROM tbl_main)
halfer
  • 19,824
  • 17
  • 99
  • 186
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • Whether the solution provided for T-SQL "WHERE id NOT IN (SELECT id FROM tbl_main)" is the best way or is there any other procedure which i can do to perform even better? – Nandan Mar 18 '22 at 04:11