0

Hello SQL Gurus out there... I have this month long problem that I can't seem to solve. I have this really wide (Reporting) table approx. 100+ fields. Right now, it has around 2M records and with the rate the data is coming, it will probably double this count in the next 1-2months. It is running acceptable at the moment. But not so sure anymore in the next couple of months. Basically, I just want to know what would be the best indexing plan for this type of table. This is not the actual table, but it is very close. I just want to illustrate the various data types used.

    ID bigint IDENTITY (UNIQUE CLUSTERED INDEX)
    Column1 varchar(4000) NULL
    ...
    Column50 varchar(1000) NULL
    Column60 int NULL
    ...
    Column60 nvarchar(max) NULL

Now, this table is currently queried in this manner:

    SELECT <ALL COLUMNS EXCEPT ID>
    FROM ThisTable WITH(NOLOCK)     --i know how this works dont warn me :)
    WHERE 
        (@Column1 IS NULL OR  CHARINDEX('|' + [Column1] + '|', @Column1) > 0))
        ...
        AND (@Column99 IS NULL OR  CHARINDEX('|' + [Column99] + '|', @Column99) > 0))

Possible values for each params/Fields are:

    @Column1 = '|value1|value2|value3|'

    Column1
    value1
    value2

I know basic execution plan reading and from what I can see... there's a lot of SCANS going on. But as i've said, it seems like i've already hit the wall that's why I'm here now seeking for help :). I know how to index small tables based on what they are intended for, tables but THIS wide?? I can just lol at myself :D

Any idea guys? I've read a little about Columnstore INDEX.. this is the most viable solution that i can think of.. but there's very little information available at this point for this approach.

If you're wondering how I came up with this table. Well its contents are coming from different tables (flatten). Processing are done on a nightly basis. The result are exported into CSV file which will then serves as the data source for another Reporting application.

Thanks in advance!

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
jackCaller
  • 205
  • 1
  • 2
  • 7
  • 1
    The problem's in the code, not the database. By applying functions to the column values you are forcing the database to scan the entire table to calculate the values, then check which of them fulfill the criteria. If you want to compare with a list of values, pass a table parrameter or join with another table that contains the values you need. What you should be asking is "how to pass multi-valued parameters in a query", not looking for indexing workarounds. You should also google for "Star Schemas", data warehousing, Ralph Kimball's books etc – Panagiotis Kanavos May 07 '14 at 13:05
  • That is the only approach that i can think of based on requirements to have multiple selection. i.e. From the multiple dropdown selection in UI, the user can select value1, value2, etc... no selection means ALL. – jackCaller May 07 '14 at 13:11
  • 1
    All columns in select list, charindex in where clause, and ORed as well, and columns that look like a product of conatenation of multiple values. There is no indexing strategy here that can help you. – dean May 07 '14 at 13:12
  • Ok, i forgot to mention i have this table valued function before where it will return the various selections made used in this way: WHERE Column1 IN dbo.fn_SplitSelections('|value1|value2|value3|','|') something to this effect. BUT it's performance is worst. I'm not sure why, but i have already made the comparison. – jackCaller May 07 '14 at 13:14
  • @dean i'm aware on most of the indexing rules. Yes, no index will be used with that kind of WHERE clause. But i only used that for lack of index on that fields and to get the desired result. I only used nonclustered index on fields with straightforward comparison (single value params). If there is other approach where an index will be of a benefit in a multi-values params then that is what i'm looking for. :) – jackCaller May 07 '14 at 13:30
  • @jackCaller Your requirements are neither novel nor unique. You just can't do it this way and no index will save you. Did you try using a [tabled-value parameter](http://msdn.microsoft.com/en-us/library/bb675163(v=vs.110).aspx) to pass a list of values, instead of concatenated strings? – Panagiotis Kanavos May 07 '14 at 13:42

1 Answers1

0

OK, if you already have a table-valued function, you can probably use it to speed things up. But beware that it makes no sense to use it directly in the query. First do a select from the function into a #temp table(s), and then join to it.

To deal with ORs, use dynamic sql with sp_executesql or option(recompile) (see my answer to similar question here for a sample).

If you can narrow down the select list, you might use covering indexes.

Columnstore indexes are great for aggregations on a single column (because with CSI all values for a column are grouped together, as opposed to row-oriented storage), but won't help in your case.

Community
  • 1
  • 1
dean
  • 9,960
  • 2
  • 25
  • 26
  • sounds like a good plan :) I'm already out of office so, will try this out during our day tom (GMT+8 here). Will give you feedback of any result. Thanks. – jackCaller May 07 '14 at 14:43
  • that's a great point, to try to segment the process if there's at least some chance of narrowing on the most popular columns. It makes me also think of forking the stored proc to be able to have separate execution plans to match. – Mike M Jun 11 '14 at 01:52
  • i.e. - if first filter is column1, call proc made and compiled to hit it's index... If first filter is column2 call it's separate proc... etc – Mike M Jun 11 '14 at 01:55