0

I'm working in an environment where data exists on numerous client databases and one by one is pulled into a central data repository via sql.

To automate a testing process, I've written a really nice, streamlined push-button script that backups, purges and re-extracts data on a user-specified client database. It then restores the data from backup tables. It makes heavy use of synonyms to streamline the code.

I'm running into a performance problem with the purge process where the DELETE query incurs a Remote Scan. It exactly the same query as the INSERT/SELECT which is simply passed as a Remote Query.

This INSERT works great:

INSERT INTO origChild
    SELECT child.*
    FROM
        bakParent par
        JOIN bakChild child ON par.GUID = child.GUID
    WHERE
        par.DateInserted = '2013-08-12 20:30:42.920'

This DELETE performs poorly:

DELETE
    bakChild
FROM
    bakParent par
    JOIN bakChild child ON par.GUID = child.GUID
WHERE
    par.DateInserted = '2013-08-12 20:30:42.920'

Below are the estimated query execution plans. The Remote Scan pulls 5M+ records while the INSERT/SELECT only deals with ~16,000 records.

I can't figure out why the plans are so different. I understand that queries to linked servers can lead to performance issues but the two JOINs are identical. I would expect them to be the same. (Or there should be a way for me to get the DELETE to perform similarly to the INSERT.)

I have confirmed that removing the INSERT part in the first query has the same execution plan.

Any suggestions?

enter image description here

Trent Baur
  • 95
  • 1
  • 9
  • Could you give us some clue about where bakChild, bakParent, and origChild are? From the looks of it, the first plan simply throws all of the optimization logic over to the other server. The second plan does it differently because it has to perform a delete, and for whatever reason, it thinks it is more efficient to pull the data to the local server and perform the hash match here. You could *try* to coerce a different behavior with `INNER REMOTE JOIN`... – Aaron Bertrand Aug 14 '13 at 23:38
  • The two queries originally run within a single script. bakParent/bakChild/origParent all exist on the same remote database. – Trent Baur Aug 15 '13 at 12:29
  • I tried out INNER REMOTE JOIN and the execution plan still uses a Remote Scan. If anything, the plan looks more involved. :( – Trent Baur Aug 15 '13 at 12:35
  • Have you considered creating a stored procedure on the remote system, and executing the stored procedure remotely? This prevents your local server from even thinking about taking on any of the responsibility. – Aaron Bertrand Aug 15 '13 at 13:05
  • That would definitely fix the problem. But the goal of the script is to not require any code to be present on the remove/client databases. I just want one push-button script that can be run off of a central repository-type database that will process data based on user-specified server/database parameters. – Trent Baur Aug 15 '13 at 14:08
  • And how have you determined that the remote scan is exactly the cause of the performance issue? Could it just be that the delete is expensive (due to lack of supporting indexes, too many indexes to affect, etc.)? – Aaron Bertrand Aug 15 '13 at 14:09

0 Answers0