1

I have two tables in database Products and Filters.

The schema:

enter image description here

I have created a query that find all the records from the filters table, loop with each record and call a procedure that set the category id for Products table.

Filter table data will be as follow.

enter image description here

The filter selection query is as follow..

DECLARE @TotalRecords INT, @Start INT, @Limit INT, @CatId INT, @Merchants NVARCHAR(max), @NotMatch NVARCHAR(max), @WillMatch NVARCHAR(max);
SELECT @TotalRecords = COUNT(*) FROM filters;

SET @Limit = 1;
SET @Start = 0;

WHILE(@TotalRecords > 0)
BEGIN       
    SELECT @CatId = category_id, @Merchants = merchant_name, @NotMatch = not_match, @WillMatch = will_match FROM 
    (
        SELECT TOP (@Start + @Limit) *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rnum 
        FROM filters
    ) a
    WHERE rnum > @Start;

    -- call filter procedure.
    exec procSetProductCategory @CatId = @CatId, @Merchants = @Merchants, @WillMatch = @WillMatch, @NotMatch = @NotMatch;

    SET @Start += 1;
    SET @TotalRecords -= 1;
END

And the procSetProductCategory as follow..

CREATE PROC [dbo].[procSetProductCategory]
(
    @CatId INT = NULL,
    @Merchants NVARCHAR(max),
    @NotMatch NVARCHAR(max),
    @WillMatch NVARCHAR(max)
)
AS
BEGIN
SET NOCOUNT ON

    declare @query nvarchar(max), @orToken nvarchar(max), @andToken nvarchar(max);
     set @query = 'UPDATE Products SET category_id = '+ convert(nvarchar(20), @CatId) + ' WHERE category_id IS NULL AND merchant_name IN(' + @Merchants + ')';

    if(@WillMatch is not null AND LTRIM(RTRIM(@WillMatch)) != '')
    BEGIN

        set @andToken = '%'' AND product_name LIKE ''%';
        set @WillMatch = REPLACE(@WillMatch, '+', @andToken);

        set @orToken = '%'') OR (product_name LIKE ''%';
        set @query = @query + ' AND ((product_name LIKE '''+ '%' + REPLACE(@WillMatch, ',', @orToken) + '%''))';
    END

    if(@NotMatch is not null AND LTRIM(RTRIM(@NotMatch)) != '')
    BEGIN
        set @andToken = '%'' AND product_name NOT LIKE ''%';
        set @NotMatch = REPLACE(@NotMatch, '+', @andToken);

        set @orToken = '%'') OR (product_name NOT LIKE ''%';
        set @query = @query + ' AND ((product_name NOT LIKE '''+ '%' + REPLACE(@NotMatch, ',', @orToken) + '%''))';
    END

    EXECUTE sp_executesql @query;
END

It generates the sql query like following...

Query #1
-------------------------------------------------------------------------------------------------------
UPDATE Products SET category_id = 101 WHERE merchant_name IN('merchant 1','merchant 4','merchant 3') AND 
 (
    (product_name LIKE '%abcd%' AND product_name LIKE '%efhg%')
 ) AND (
    (product_name NOT LIKE '%3258%')
     OR (product_name NOT LIKE '%yxzs%')
)


Query #2
-------------------------------------------------------------------------------------------------------
UPDATE Products SET category_id = 102 WHERE merchant_name IN('merchant 3', 'merchant 4') AND 
(
    (product_name LIKE '%1258%') OR (product_name LIKE '%abcd%')
)

Note there are some trick used here.

[,] is used to differentiate match phrases. [+] in match fields used for two match phrases with AND conditions.

These query doing the same what I needed..

enter image description here

Issue is that when I run this query with 500 000 products its using about 100% CPU.

How can we optimize the query that doesn't take impact on result but can reduce the CPU usage?

James Z
  • 12,209
  • 10
  • 24
  • 44
Vikash Pathak
  • 3,444
  • 1
  • 18
  • 32
  • Which query are you referring to? – Gordon Linoff Feb 08 '16 at 14:27
  • To optimize both of the query.. used for fatching/iterating the filter settings and the query for procSetProductCategory.. – Vikash Pathak Feb 08 '16 at 14:30
  • 1
    Why on earth are you storing merchant as a comma delimited list? Why not a separate record for each merchant, category? And I don't understand the data in the will match and not match fields at all. But ingeneral your deign is very flawed which is why it is hard to query. – HLGEM Feb 08 '16 at 16:27
  • We have few conditions here.like a certain filter will applied on products supplied by a group of merchants. And 'WillMatch' field indicate at least one of the word given by comma separated should be match in the product name. 'NotMatch' indicates that none of each word should be match with product name(should not exists anywhere inside the product name). Two words adjoin with + means both words should be exists in the product name order doesn't matter. So these are few complication here. – Vikash Pathak Feb 09 '16 at 05:53

2 Answers2

1

Without a query plan, it's hard to be sure, but I'm guessing this is because you are matching on '%something%', which means the query has to inspect every row.

That's always going to be slow, and there's nothing much you can do to help with indexing.

If you're doing text comparison, you may get better performance by using SQL Server's full text matching feature.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • I have tried that also.. but some problem because I need the token matching with negative and positive phrases (will_match, not_match). – Vikash Pathak Feb 08 '16 at 14:39
1

For starters, as pointed out already: there really is something wrong with the logic here. That said, assuming you are stuck with it there are some things you might want to try. My first question would be: how long does this thing run? You shouldn't worry too much that it takes 100% CPU; the question is how much time it takes to finish.

Query1:

It seems that you are creating a loop over the the filters table, fetching every single row, one by one.

  • SQL isn't optimised to do row-by-row operations; you really should consider changing the logic to something set-based
  • If you really want to do something row by row, then please use a CURSOR and not the current approach.
    • First you go over the entire table to count how many filters there are
    • Then you go over the entire table and order the records by SELECT 1
    • Out of the sorted list you pick one that has rnum bigger than your counter

=> This is wrong in so many ways, it actually hurts =(

  • If you sort/order by SELECT 1 then it could return the records in the order ABCD the first time and BADC the second time; and both answers would be correct because you're sorting by a constant: the actual order of the records doesn't matter!
  • Each and every time you go through the loop, the server has to sort the entire table before it can tell which rnum values will fit the requirement of being greater than @start; EVERY TIME!
  • There will be many records that fit rnum > @start, the returned record being used to fill up the records could be any one of them!

To 'fix' this I'd suggest to use the following approach:

DECLARE @TotalRecords INT, 
        @Start INT, 
        @Limit INT, 
        @CatId INT, 
        @Merchants NVARCHAR(max), 
        @NotMatch NVARCHAR(max), 
        @WillMatch NVARCHAR(max);

DECLARE filter_loop CURSOR LOCAL FAST_FORWARD
    FOR SELECT category_id, 
               merchant_name,
               not_match,
               will_match
          FROM filters
         ORDER BY id -- not required but makes debugging easier
OPEN filter_loop 
FETCH NEXT FROM filter_loop INTO @CatId, @Merchants, @NotMatch, @WillMatch
WHILE @@FETCH_STATUS = 0
    BEGIN

        -- call filter procedure.
        exec procSetProductCategory @CatId = @CatId, @Merchants = @Merchants, @WillMatch = @WillMatch, @NotMatch = @NotMatch;

        -- get next filter
        FETCH NEXT FROM filter_loop INTO @CatId, @Merchants, @NotMatch, @WillMatch
    END
CLOSE filter_loop 
DEALLOCATE filter_loop 

Query2:

At first sight there is very little I can do about the stored procedure itself. There is some dynamic sql string building that might be optimized a little bit but I very much doubt it will make much impact. As it is right now it's fairly readable, so I'd leave it as is. The generated query then indeed looks something like this:

UPDATE Products 
   SET category_id = 101 
 WHERE merchant_name IN ('merchant 1','merchant 4','merchant 3') 
   AND ((product_name LIKE '%abcd%' AND product_name LIKE '%efhg%') ) 
   AND ((product_name NOT LIKE '%3258%') OR (product_name NOT LIKE '%yxzs%'))

for which I'd advice to create the following index:

CREATE INDEX idx_test ON Products (merchant_name) INCLUDE product_name)

Afterthoughts

Even with the changes above in place, this will still run for quite a while when working on 100k+ records. The only real solution around this would be to use a set-based approach, but would require either a gargantuan dynamic sql string; or some better knowledge about the data itself. E.g. You might try combining different Filters records that have the same Merchants value but different Match/NoMatch... Probably not too difficult, but I'd suggest to start with the suggestions above first and then see where you end up.

deroby
  • 5,902
  • 2
  • 19
  • 33