2

I have a query that collects data from six tables of the same type. To create the final selection, I use UNION ALL. Unfortunately, the optimizer scans each of the six tables in turn, and then collects into one. Is there any way to make the optimizer scan tables in parallel?

  |--Concatenation
       |--Index Scan(OBJECT:([EGeoCache].[NonClusteredIndex-20190815-105027] AS [GC]))
       |--Index Scan(OBJECT:([YGeoCache].[NonClusteredIndex-20190814-103125] AS [GC]))
       |--Index Scan(OBJECT:([GGeoCache].[NonClusteredIndex-20190814-103358] AS [GC]))
       |--Index Scan(OBJECT:([HGeoCache].[NonClusteredIndex-20190814-103422] AS [GC]))
       |--Index Scan(OBJECT:([DGeoCache].[NonClusteredIndex-20190814-103305] AS [GC]))
       |--Index Scan(OBJECT:([SGeoCache].[NonClusteredIndex-20190814-103457] AS [GC]))

SELECT
    VEGC.AddressID
  , VEGC.Lat
  , VEGC.Lon
FROM    vEGeoCache AS VEGC
UNION ALL
SELECT
    VYGC.AddressID
  , VYGC.Lat
  , VYGC.Lon
FROM    vYGeoCache AS VYGC
UNION ALL
SELECT
    VGGC.AddressID
  , VGGC.Lat
  , VGGC.Lon
FROM    vGGeoCache AS VGGC
UNION ALL
SELECT
    VHGC.AddressID
  , VHGC.Lat
  , VHGC.Lon
FROM    vHGeoCache AS VHGC
UNION ALL
SELECT
    VDGC.AddressID
  , VDGC.Lat
  , VDGC.Lon
FROM    vDGeoCache AS VDGC
UNION ALL
SELECT
    VSGC.AddressID
  , VSGC.Lat
  , VSGC.Lon
FROM    vSGeoCache AS VSGC
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
user2996299
  • 123
  • 3
  • 3
    This is a really good question. I don't think there is a way to have these run in parallel, but I would be curious to learn if there is a way. – Gordon Linoff Aug 16 '19 at 09:15
  • Since I cannot reproduce your case exactly, I selected the field of the same type from multiple tables and execution is parallel, paralelism is even visible if you turn on `Include LIve Query Statistics`. – avb Aug 16 '19 at 11:38
  • Are the tables on separate devices (or memory resident) so that parallel execution would not create an I/O bottleneck? Otherwise it isn't clear what benefit you expect from parallel execution. – HABO Aug 16 '19 at 12:18

1 Answers1

2

To force a parallel execution plan you can use OPTION (QUERYTRACEON 8649) which requires sysadmin rights. On SQL Server 2016+ you can use OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE')) which does not require sysadmin rights.

Both QUERYTRACEON 8649 and ENABLE_PARALLEL_PLAN_PREFERENCE are undocumented which means they are unsafe to run in Production (in my book). Your third option is to use Make_Parallel by Adam Machanic which is not undocumented (it just uses old school math).

using each looks like this:

...
FROM    vDGeoCache AS VDGC
UNION ALL
SELECT
    VSGC.AddressID
  , VSGC.Lat
  , VSGC.Lon
FROM    vSGeoCache AS VSGC
OPTION (QUERYTRACEON 8649);

...
FROM    vDGeoCache AS VDGC
UNION ALL
SELECT
    VSGC.AddressID
  , VSGC.Lat
  , VSGC.Lon
FROM    vSGeoCache AS VSGC
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'))

...
FROM    vDGeoCache AS VDGC
UNION ALL
SELECT
    VSGC.AddressID
  , VSGC.Lat
  , VSGC.Lon
FROM    vSGeoCache AS VSGC
CROSS JOIN dbo.make_parallel();

Make_parallel is the safest way to go but creates a bloated execution plan. The way I do it is: I use OPTION (QUERYTRACEON 8649) in my testing in Dev. The most important thing to note is that these options don't guarantee a parallel plan. Make sure to run your queries with "Include Actual Execution plan" turned on to see if it's working. If there are any parallelism inhibiting components (such as a scalar UDF as a computed column or check constraint) then nothing will force a parallel plan. If I determine that forcing a parallel plan is the way to go (something to force with extreme caution) then I use make_parallel in Production.

Alan Burstein
  • 7,770
  • 1
  • 15
  • 18