Table Valued Function Killing My Query Performance

Question

I was having a horrible time today trying to get a query to perform the way I would expect. I had to make a slight change to a table valued function that lives in the query yesterday and that change created a huge performance impact on the query. After evaluating the execution plan and looking at statistics IO and Time I found that because I changed the function to return a table variable instead of just a result set it was doing a full scan on one of the tables being queried.

My question is why would having it return the table (TableVariable) instead of just a Select / Result set cause such a big change to the plan?

Stumped....

score 70 · Accepted Answer · edited Dec 09 '21 at 06:50

70

Returning a Table Variable will make it a multi-statement table valued function and can be bad for performance due to the fact that it's treated like a table except there are no statistics available for SQL Server to base a good execution plan on - so it will estimate the function as returning a very small number of rows. If it returns a larger number of rows, then therefore the plan generated could be a lot less than optimal.

Whereas, returning just a SELECT makes it an inline table valued function - think of it more as a view. In this case, the actual underlying tables get brought into the main query and a better execution plan can be generated based on proper statistics. You'll notice that in this case, the execution plan will NOT have a mention of the function at all as it's basically just merged the function into the main query.

There's a great reference on it on MSDN by CSS SQL Server Engineers including (quote):

But if you use multi-statement TVF, it’s treated as just like another table. Because there is no statistics available, SQL Server has to make some assumptions and in general provide low estimate. If your TVF returns only a few rows, it will be fine. But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, inefficient plan can result from low cardinality estimate.

edited Dec 09 '21 at 06:50

Glorfindel

21,988
13
81
109

answered Nov 05 '10 at 19:08

AdaTheDev

142,592
28
206
200

3

Added a quote from the reference CSS article which explains better than me :) I have to disagree with the downvote. – AdaTheDev Nov 05 '10 at 19:34
Saw Kimberly Tripp do a demo on this at a conference once, with a very simple UDF that involved no tables at all, (so statistics would be irelevant). ALl the udf did was convert an input parameter using some simple expression based on another column in each row. One udf was standard table value udf, the other was inline udf. SQL generated couple million rows, (one for each row in table). Inline udf processed in less than a second. Standard udf took several minutes. – Charles Bretana Nov 06 '10 at 00:56
5

@charles Bretana Yes but you are talking about going from inline scalar UDF to table valued UDF. Whereas OP has gone from inline table valued function to multi statement table valued UDF. So my answer is correct in this scenario/context and a downvote is not justified – AdaTheDev Nov 06 '10 at 04:27
1

@AdaTheDev, On that restricted basis I will remove the downvote, but perhaps you're missing the point. Whether the op is talking about a scalr udf or a multi-statement table-valued udf that is dependant only on column values from the same row, statistics on that table or any pther table are not relevant (They would have no impact on the cache plan or on the performance). It is only in the case of a multi-Valued statement valued udf that is based on data from other rows in the same table or from data in other tables, that statistics might affect the answer. – Charles Bretana Nov 06 '10 at 14:26
1

Secondly, my example was designed to illustrate the enormous impact on performance of having to recompile the udf for EVERY ROW it is used in. This effect is demonstrably so large that on the face of it, it would override whatever effect not having statistics might cause. I presented the example to make that point. I believe the reference you provided is, simply, not accurate in that it neglects to discuss this more significant efect that occurs from all udfs except inline udfs. The point is that it is the inlining that is significant. – Charles Bretana Nov 06 '10 at 14:29
As to the ops issue, we don't know the nature of his/her table-valued udf, we don't know from his question whether his/her udf reads table data at all, or only from the one row it's output is displayed in, or from other rows in that table, or from other tables... – Charles Bretana Nov 06 '10 at 14:33
I have a query that uses an inline table-valued function. It takes 2-3 minutes to run. If I replace the inline table-valued function with a subselect with the same query as the function, the query completes in less than 1 second. That doesn't make sense to me. I will be steering clear of any table-valued functions, inline or not, because while they can improve code reusability, their performance seems completely random at times. – Kevin Doyon Oct 03 '17 at 14:40
Great answer. This was exactly what I was looking for: "You'll notice that in this case, the execution plan will NOT have a mention of the function at all as it's basically just merged the function into the main query." – julealgon Oct 19 '18 at 22:26

Charles Bretana · Answer 2 · 2015-06-11T18:22:52.427

5

This is because a multi-Statement Table valued UDF cannot be processed inline with the rest of the SQL statememnt it is used in, and therefore cannot be part of the statement cache plan.. That means that it must be compiled separately from the rest of the SQL it is used in, over and over, for every row in the final resultset generated by the query.

An Inline Table valued UDF, otoh, is processed and compiled along with the sql it is used in, and it therefore becomes part of the cache plan and only gets processed and compiled once, no matter how many rows you generate.

edited Jun 11 '15 at 18:22

answered Nov 05 '10 at 19:18

Charles Bretana

143,358
22
150
216

Isn’t the difference between a multi-statement table-valued UDF and an inline table-valued UDF more like the difference bewteen a table variable and a table? – binki Jun 11 '15 at 18:02
No, it is not. It is what my answer says. And the difference between a table variable and a table? or do you mean the difference between the resultset created by a select and the resultset that a table variable points to ? - because, once created, there's no difference between those at all. But there's an enormous difference between a T-SQL variable (Whatever it might point to, and a resultset, (no matter what was used to generate it and construct it) – Charles Bretana Jun 11 '15 at 18:26
1

Your answer claims that the issue is query plan caching. But if the multi-statement table-valued UDF is in a query’s `FROM` or `JOIN` clause, it is only ever executed once and, if UDFs really do not have their query plans cached, that is still just one compilation rather than a compilation per row. However, your description of how inline table valued UDFs works seems correct. I’m talking about how SQL Server handles a simple unindexed table variable and table differently when listed in `FROM` or `JOIN` clauses. – binki Jun 11 '15 at 18:39
No, it will be compiled over and over for each time it is executed. So if it is a correlated statement (one where the output is dependent on the row by row values of the other columns in the resultset, then it will be executed (and therefore re-compiled) once for each row in the resultset. – Charles Bretana Jun 11 '15 at 18:48
Only if the output is the same set of rows, (with the same values) for every row of the resultset, will it be executed(and compiled) only once... And if that's the case, you could just as easily run it as a separate SQL statement in a batch, and use the results in the second statement just like an independent table. – Charles Bretana Jun 11 '15 at 18:50

score 5 · Answer 3 · edited Jul 31 '17 at 14:32

On the SQL Server 2014 we were able to solve our issue by inserting table value function data into temp table and then doing join on it. Instead of doing a join directly to table value function.

This improved our execution time from 2 min to 4 secs.

Here is an example that worked for our team:

--SLOW QUERY (2 min):

DECLARE @id INT = 1;

SELECT * 
FROM [data].[someTable] T
INNER JOIN [data].[tableValueFunction](@id) TVF ON TVF.id = T.id;

--FAST QUERY (4 sec):

DECLARE @id INT = 1;

SELECT * 
INTO #tableValueFunction
FROM [data].[tableValueFunction](@id) TVF

SELECT * 
FROM [data].[someTable] T
INNER JOIN #tableValueFunction TVF ON TVF.id = T.id;

score 3 · Answer 4 · answered Nov 05 '10 at 19:09

Really impossible to answer definitively without more information. However, since I like to take crazy stabs in the dark . . .

Table variables can't be optimized by the engine--the engine always "assumes" that the table variable only has one row in it when it generates an execution plan. That is one reason why you might be seeing strange performance.

binki · Answer 5 · 2017-06-19T19:10:52.213

When using multi-statement table-valued UDF, that UDF is run to completion before its results can be used by the caller. With an inline table-valued UDF, the SQL Server basically expands the UDF into the calling query just like macro expansion. This has the following implications, among others:

The calling query’s WHERE clause can be interpolated directly into an inline table-valued UDF, but not a multi-statement UDF. Thus, if your table-valued UDF generates a lot of rows that would be filtered out by the calling query’s WHERE clause, the query optimizer can apply down the WHERE clause directly into an inline table-valued UDF but not into a multi-statement UDF.
An inline table-valued UDF behaves like a parameterized VIEW would if SQL Server had such a concept whereas a multi-statement table-valued UDF would behave like you populated and then used a table variable in your query.

If your UDF returns many rows and is backed by a table, I imagine this could be where the table scan is coming from. Either add more parameters to your UDF to enable the caller to constrain its result size or try to reformulate it as an inline table-valued UDF with the help of friends such as UNION et al. I would avoid multi-statement table-valued UDFs at all costs unless if the result size is known to only be a few rows and it is hard to produce the required results with set-based logic.

Table Valued Function Killing My Query Performance

5 Answers5

Linked