1

I've created a script to search for candidate tables for partitioning, and using the index information, I'd like to find the ideal column on which to partition. I'm ignoring (for now) which columns are most often queried. I have a basic query below that prioritizes columns that are part of clustered indexes, and then whether they are a primary key.

So using this admittedly rough approach, which are the best criteria / qualities of an indexed column that I can also include to determine whether they are good partitioning columns? If you can provide reasoning for those attributes as well please.

If you disagree with the approach altogether, that's fine, please comment, but keep in mind that this is for preliminary analysis of partitioning outcome, prior to delving into best practices.

;WITH cte AS
(
    SELECT 
        t.name AS TableName, ind.name AS IndexName, 
        ind.type_desc AS TypeDesc, col.name AS ColumnName,
        st.name AS DataType, ind.is_unique AS IsUnique, 
        ind.is_primary_key AS IsPrimaryKey, 
        ind.is_unique_constraint AS IsUniqueCons, 
        ROW_NUMBER() OVER (PARTITION BY t.name 
                           ORDER BY
                               CASE WHEN ind.type_desc = 'CLUSTERED' THEN 1 ELSE 2 END,
                               CASE WHEN ind.is_primary_key = 1 THEN 1 ELSE 2 END) AS row_num
    FROM 
        sys.indexes ind
    INNER JOIN 
        sys.index_columns ic ON  ind.object_id = ic.object_id 
                             AND ind.index_id = ic.index_id
    INNER JOIN 
        sys.columns col ON ic.object_id = col.object_id  
                        AND ic.column_id = col.column_id
    INNER JOIN 
        sys.tables t ON ind.object_id = t.object_id
    INNER JOIN 
        sys.types st ON col.system_type_id = st.system_type_id
    WHERE 
        t.is_ms_shipped = 0
        AND st.name IN ('datetime', 'smalldatetime')
)
SELECT * 
FROM cte 
WHERE row_num = 1
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
John
  • 434
  • 2
  • 20

1 Answers1

0

You cannot detect good partitioning columns from the schema alone. It mostly depends on the workload.

Partitioning is not to make queries go faster. It can do that but it's often inferior to normal indexing. Most often, partitioning makes queries slower. The query has to fit the partitioning scheme.

Partitioning is to be able to delete and add partitions in constant time and to put different data onto different storage (different throughput and different backup schedule).

I think your approach can never work. You need to throw this away.

usr
  • 168,620
  • 35
  • 240
  • 369
  • The approach is quite good in my case. As I said, for now I'm just trying to get metrics in a partitioning scenario (left/right totals etc.) The clustered indexes generally contain the date field(s) that we'd be partitioning by if we do decide to partition that table. But before I do anything, I want metrics with different dates, and including or excluding certain tables by record counts, total size, etc., so, this approach is working out pretty well so far. And most of the queries should be adhering to the indexes. These tables are so large, I'd be surprised if they weren't. – John Mar 31 '16 at 20:44
  • You have asked what criteria to search for. My answer is: There are none. You will find that this is true for the reasons I specified. `determine whether they are good partitioning columns` Impossible using only schema information. Your approach is doomed. This answer therefore resolves the question that you have asked. – usr Mar 31 '16 at 21:41