1

let's say i have a table with 3 million rows, the table does not have a PK nor Indexes.

the query is as follows

SELECT SKU, Store, ColumnA, ColumnB, ColumnC
FROM myTable
WHERE (SKU IN (select * from splitString(@skus)) OR @skus IS NULL)
AND (Store IN (select * from splitString(@stores)) OR @stores IS NULL)

Please consider that @sku and @store are NVARCHAR(MAX) containing a list of ids separated by comma. SplitString is a function which converts a string in format '1,2,3' to a table of 1 column and 3 rows as shown in the following picture.

This pattern allows me to send arguments from the application and filter by sku or by store or both or none.

Description of SplitString

What can I do to improve performance of this query? - I know Indexes are a good idea, but I don't really know about that stuff, so a guidance to that will be helpful. Any other ideas?

Gabriel Espinoza
  • 385
  • 1
  • 18
  • try using exists... this code WHERE (SKU IN (select * won't event work – RoMEoMusTDiE Jan 08 '18 at 21:36
  • @hatchet could you elaborate your answer, I don't quite understand what you are saying and why is important – Gabriel Espinoza Jan 08 '18 at 22:01
  • you can't expect to run queries against an unindexed table with several million records and get respectable results...index your table. – user2366842 Jan 08 '18 at 22:59
  • use Execution Plan to optimize your query performance and AVOID OR operator as much as possible, read more about it http://ubitsoft.com/help_19/html/2b9aa1e4-1d51-47ec-bbb6-503e9ad114bf.htm – iSR5 Jan 09 '18 at 00:22
  • plus, you should consider tables rather than functions which creates tables. IF possible, do the work in reverse, meaning, you could create a permanent table and control it with a stored procedure, and modify your application to insert all data inside that table, so the application will only sync between the database and whatever data on the other end. then, you can index your tables and improve their performance. – iSR5 Jan 09 '18 at 00:28
  • Maybe I should have clarified. The database is normalized (at least the most part), this table is a report one, I mean, a table which contains processed data ready to show to the customer. I created a PK and 2 non-clustered indexes, one for SKU and the another for Store. The performance does not improve :( – Gabriel Espinoza Jan 09 '18 at 12:18
  • *"the table does not have a PK "* - Well then, it's not much of a table now, is it? – Zohar Peled Jan 10 '18 at 14:36

3 Answers3

1

Another thing to avoid is using functions in your Where clause. That will slow a query down.

Try putting this at the beginning of your script, before the first SELECT:

    SELECT skus_group INTO #skus_group 
    FROM  (SELECT item AS skus_group FROM     
    splitstring(@skus, ','))A;

Then replace your WHERE clause:

    WHERE SKU IN(Select skus_group FROM #skus_group)

This normally improves performance because it takes advantage of indexes instead of a table scan, but since you're not using any indexes I'm not sure how much performance gain you'll get.

jackstraw22
  • 517
  • 10
  • 30
  • I definetly tried this but I didn't see any noticeable improvement, however, I really like how tidy the query gets once you have everything by parts – Gabriel Espinoza Jan 09 '18 at 13:57
  • Have you added any primary keys to your table? If you have 3 million rows and your tables have primary keys, indexes, etc. then you should definitely start seeing some performance improvement. – jackstraw22 Jan 09 '18 at 16:20
1

This type of generic search query tends to be rough on performance.

In addition to the suggestion to use temp tables to store the results of the string parsing, there are a couple other things you could do:

Add indexes

It's usually recommended that each table have a clustered index (although it seems there is still room for debate): Will adding a clustered index to an existing table improve performance?

In addition to that, you will probably also want to add indexes on the fields that you're searching on.

In this case, that might be something like:

  1. SKU (for searches on SKU alone)
  2. Store, SKU (for searches on Store and the combination of both Store and SKU)

Keep in mind that if the query matches too many records, these indexes might not be used. Also keep in mind that making the indexes cover the query can improve performance: Why use the INCLUDE clause when creating an index?

Here is a link to Microsoft's documentation on creating indexes: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-index-transact-sql

Use dynamic SQL to build the query

I need to preface this with a warning. Please be aware of SQL injection, and make sure to code appropriately! How to cleanse dynamic SQL in SQL Server -- prevent SQL injection

Building a dynamic SQL query allows you to write more streamlined and direct SQL, and thus allows the optimizer to do a better job. This is normally something to be avoided, but I believe it fits this particular situation.

Here is an example (should be adjusted to take SQL injection into account as needed):

DECLARE @sql VARCHAR(MAX) = '
    SELECT SKU, Store, ColumnA
    FROM myTable
    WHERE 1 = 1
';

IF @skus IS NOT NULL BEGIN
    SET @sql += ' AND SKU IN (' + @skus + ')';
END

IF @stores IS NOT NULL BEGIN
    SET @sql += ' AND Store IN (' + @stores + ')';
END

EXEC sp_executesql @sql;
Soukai
  • 463
  • 5
  • 8
  • I added some indexes as suggested, but didn't see any performance increase. I also tried dynamic sql, this was a great hit! I think the parameter (or argument) sniffing prevents the engine from generating the optimal execution plan. Thanks a lot! – Gabriel Espinoza Jan 09 '18 at 13:56
  • @GabrielEspinoza As your database becomes larger and larger, the more likely you are to see benefits from indexing. – Eka putra Mar 05 '21 at 06:29
1

This will work faster i believe:

SELECT SKU, Store, ColumnA, ColumnB, ColumnC FROM myTable WHERE @skus IS NULL AND @stores IS NULL
UNION ALL 
SELECT SKU, Store, ColumnA, ColumnB, ColumnC 
FROM myTable 
INNER JOIN (select colname AS myskus from splitString(@skus))skuses ON skuses.myskus = myTable.SKU
INNER JOIN (select colname AS mystore from splitString(@stores))stores ON stores.mystore = myTable.Store