0

I would like to process massive amount of SQL queries and only keep unique ones. I mean same queries with different parameters will be removed and only 1 signature query will be left for each unique query.

Now what i am asking in this question is the programming logic. Pseudo code perhaps or real .net code. Or any third party plugin/software/framework which is free.

Here example case:

    select PathSquares from tblPathFinding where RouteId=470 
and StartingSquareId=267 and ExitSquareId=13;

    select PathSquares from tblPathFinding where RouteId=470 
and StartingSquareId=267 and ExitSquareId=1472;

As you can see above queries, they are actually same query with different parameter. So with a logic, i need to eliminate such duplicate queries and keep only 1 which would be enough for Database Engine Tuning Advisor.

What kind of logic and approach should i follow to detect duplicate queries ?

Waiting your suggestions ty

I will use .net 4.5 C# WPF application to solve this problem

Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
  • You're basically going to be writing a SQL parser to do this, no small task. – DavidG Oct 14 '14 at 13:34
  • @DavidG i see. Do you know any existing one ? Or any article etc. I don't want to invent wheel again :) – Furkan Gözükara Oct 14 '14 at 13:34
  • I don't sorry, and I suspect that this question will get closed as it's either too broad or asking for a tool recommendation. – DavidG Oct 14 '14 at 13:35
  • @DavidG do you think that removing words following to = and keeping rest as unique query would work ? – Furkan Gözükara Oct 14 '14 at 13:36
  • There's nothing that prevents people from writing Yoda queries - `where 470=RouteId` would be perfectly valid. – Damien_The_Unbeliever Oct 14 '14 at 13:40
  • I've got a vague recollection that one of the tools from the [RML Utilities](http://support.microsoft.com/kb/944837) does this filtering for you when you ask it to do an analysis, but it's about 5 years since I used it and can't remember the details (and this is a tool recommendation) – Damien_The_Unbeliever Oct 14 '14 at 13:44
  • You can partially cheat. You can look in the execution plan cache to see what SQL Server itself considers queries that are "the same": `select t.[text] from sys.dm_exec_cached_plans p cross apply sys.dm_exec_sql_text(p.plan_handle) t`. If you do this you'll also hit on the limitations of this approach, which is that unifying queries doesn't work very neatly if parameters are not passed with explicit sizes (which causes a lot of plan cache pollution). Still, it's something. – Jeroen Mostert Oct 14 '14 at 15:42

1 Answers1

3

I don't know the programming logic but I can point you to an application I've used that does this. The ClearTrace application takes a SQL Server trace and "normalizes" all the queries within the trace.

They were somehow able to understand what could be replaced as a variable within a query.

enter image description here

Cory
  • 12,404
  • 7
  • 33
  • 28
  • ty however it doesnt work on sql server 2014 :D however i have an idea how to do this. executing each query with xml plan and parsing plan hash :D – Furkan Gözükara Oct 15 '14 at 16:42
  • Cory, What does # indicate in the clearTrace results? Is it the number of times the sql was executed? – Daniel Feb 15 '17 at 11:13