2

Execution time differs too much between the queries below. These are the generated queries from an app using Entity Framework.

The first one is non-parameterized query that takes 0,559 seconds.

SELECT 
"Project1"."C2" AS "C1", 
"Project1"."C1" AS "C2", 
"Project1"."KEYFIELD" AS "KEYFIELD"
FROM ( SELECT 
    "Extent1"."KEYFIELD" AS "KEYFIELD", 
     CAST( "Extent1"."LOCALDT" AS date) AS "C1", 
    2 AS "C2"
    FROM "MYTABLE" "Extent1"
    WHERE (
  ("Extent1"."LOCALDT" >= to_timestamp('2017-01-01','YYYY-MM-DD')) AND 
  ("Extent1"."LOCALDT" <= to_timestamp('2018-01-01','YYYY-MM-DD'))
  )
)  "Project1"
ORDER BY "Project1"."C1" DESC;

The other one has parameterized WHERE clause. It takes 18,372 seconds to fetch the data:

SELECT 
"Project1"."C2" AS "C1", 
"Project1"."C1" AS "C2", 
"Project1"."KEYFIELD" AS "KEYFIELD"
FROM ( SELECT 
    "Extent1"."KEYFIELD" AS "KEYFIELD", 
     CAST( "Extent1"."LOCALDT" AS date) AS "C1", 
    2 AS "C2"
    FROM "MYTABLE" "Extent1"
    WHERE (
  ("Extent1"."LOCALDT" >= :p__linq__0) AND 
  ("Extent1"."LOCALDT" <= :p__linq__1)
  )
)  "Project1"
ORDER BY "Project1"."C1" DESC;

I know that parameterized queries are pretty useful for caching. How can I find the way to improve the performance of the parameterized query?

APC
  • 144,005
  • 19
  • 170
  • 281
Tunahan
  • 303
  • 4
  • 22

1 Answers1

3

"parameterized queries are pretty useful for caching"

Just to be clear, when we use bind variables what gets cached is the parsed query and the execution plan. The assumption is that given a query like ...

where col1 = :p1
and   col2 = :p2

... the same plan works as well when :p1 = 23 and :p2 = 42 as when :p1 = 42 and :p2 = 23. If our data has an even distribution then the assumption holds good. But if our data has some form of skew we may end up with a plan which works well for one specific combination of values but is rubbish for most of the other queries our users need to run. This is a phenomenon known as bind variable peeking.

Date range queries are a notorious case in point. Your first query provides values that will match records for a well defined range. Presuming that retrieves a narrow slice of the table. However, with the second query the specified date range could be anything: a day, a week, a month, a year, a - well you get the picture.

The upshot is, an index range scan could be very efficient for the first query and shocking for the second.

To understand more you need to explore the specific query:

  • Run explain plans for the two versions of the queries, understand the differences. (Make sure you're working with realistic (production-like) data: not just volumes but distribution and skew as well.
  • Check the statistics are accurate, and consider whether refreshing them might help.
  • Understand the skew of the data, and check whether you are suffering from bind variable peeking. Perhaps you need to look at adaptive cursors.
  • Alternatively you may need to avoid using bind variables. Especially with date ranged queries on large tables it is not unusual to pass actual values for the date arguments. The cost of parsing the query each time it is executed is offset by getting the best plan for each set of parameters.

In short, we should understand our data and the way our users need to work with it, then write queries accordingly.

APC
  • 144,005
  • 19
  • 170
  • 281