Awkward JOIN causes poor performance

Question

I have a stored procedure that combines data from several tables via UNION ALL. If the parameters passed in to the stored procedure don't apply to a particular table, I attempt to "short-circuit" that table by using "helper bits", e.g. @DataSomeTableExists and adding a corresponding condition in the WHERE clause, e.g. WHERE @DataSomeTableExists = 1

One (psuedo) table in the stored procedure is a bit awkward and causing me some grief.

DECLARE @DataSomeTableExists BIT = (SELECT CASE WHEN EXISTS(SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') THEN 1 ELSE 0 END);
...  

UNION ALL

SELECT *
FROM REF_MinuteDimension AS dim WITH (NOLOCK)  
CROSS JOIN  (SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') AS T       
CROSS APPLY dbo.fGetLastValueFromSomeTable(T.ParentId, dim.TimeStamp) dpp
WHERE @DataSomeTableExists = 1 AND dim.TimeStamp >= @StartDateTime AND dim.TimeStamp <= @EndDateTime

UNION ALL

...

Note: REF_MinuteDimension is nothing more than smalldatetimes with minute increments.

(1) The execution plan (below) indicates a warning on the nested loops operator saying that there is no join predicate. This is probably not good, but there really isn't a natural join between the tables. Is there a better way to write such a query? For each ParentId in T, I want the value from the UDF for every minute between @StartDateTime and @EndDateTime.

(2) Even when @DataSomeTableExists = 0, there is I/O activity on the tables in this query as reported by SET STATISTICS IO ON and the actual execution plan. The execution plan reports 14.2 % cost which is too much considering these tables don't even apply in this case.

SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable' comes back empty.

Is it the way my query is written? Why wouldn't the helper bit or an empty T short circuit this query?

enter image description here

You should change that `DECLARE` thing to this: `declare @DataSomeTableExists bit = 0 if exists(SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') begin set @DataSomeTableExists = 1 end`. That's hard to read, so here's [pastebin](http://pastebin.com/4y0VTJCD). And why don't you have your "short circuit" `WHERE` argument in the `CROSS JOIN` query also? I should mention, I have *no clue* if this will ever work, as it's kind of an odd way to write a sproc. — tommy_o, Jan 27 '14 at 19:41

score 0 · Answer 1 · answered Jan 27 '14 at 19:14

For 2) I can sure say that line

CROSS JOIN  (SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') AS T

Ill force #T to be analysed and to enter a join. You can create to versions of a SP with and without that join and use that flag to execute one or another but I cannot say that ill save any response time||cpu clocks||I/O bandwith||memory.

For 1) I suggest to remove the (nolock) if you are using SQL Server 2005 or better and to keep a close eye in that UDF. Cannot say more without a good SQL fiddle.

score 0 · Answer 2 · answered Jan 27 '14 at 19:52

I should mention, I have no clue if this will ever work, as it's kind of an odd way to write a sproc and table-valued UDFs aren't well understood by the query optimizer. You might have to build your resultset into a table variable or temp table conditionally, based on IF statements, then return that data. But I would try this, first:

--helper bit declared
declare @DataSomeTableExists BIT = 0x0
if exists (select 1 from #T where StorageTable = 'DATA_SomeTable')
begin
    set @DataSomeTableExists = 0x1
end

...  

UNION ALL

SELECT *
FROM REF_MinuteDimension AS dim WITH (NOLOCK)  
CROSS JOIN  (SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable' and @DataSomeTableExists = 0x1) AS T
CROSS APPLY dbo.fGetLastValueFromSomeTable(T.ParentId, dim.TimeStamp) dpp
WHERE @DataSomeTableExists = 0x1 AND dim.TimeStamp >= @StartDateTime AND dim.TimeStamp <= @EndDateTime

UNION ALL

...

And if you don't know already, the UDF might be giving you weird readings in the execution plans. I don't know enough to give you accurate data, but you should search around to understand the limitations.

Thanks for the reply. That seems to do the trick. Curious why you changed the helper bit declaration? Is this just for readability? — John Russell, Jan 27 '14 at 20:17
Yeah. You're declaring a variable, setting it's default value, while also querying using `EXISTS` with a subquery `SELECT`. That's a lot for the next developer to sit down and read when there are other ways to easily write it. — tommy_o, Jan 27 '14 at 21:09

score -1 · Answer 3 · answered Jan 27 '14 at 20:33

-1

Since your query is dependent on run-time variables, consider using dynamic SQL to create your query on the fly. This way you can include the tables you want and exclude the ones you don't want.

There are downsides to dynamic SQL, so read up

answered Jan 27 '14 at 20:33

gmakepeace

21
2

A better alternative, as @jean has suggested, is to create one stored procedure each for when the data needs to be pulled and when the data does not. Then, from a master sproc, conditionally fire one sproc or the other. Then you are using *dynamic logic* (and retaining query plans) but not *dynamic SQL* (which is almost never really required, so should be avoided). – tommy_o Jan 27 '14 at 21:11

Awkward JOIN causes poor performance

3 Answers3