Insert sorted data to a table with nonclustered index

Question

My db schema:

Table point (point_id int PK, name varchar);
Table point_log (point_log_id int PK, point_id int FK, timestamp datetime, value int)

point_log has an index:

point_log_idx1 (point_id asc, timestamp asc)

I need to insert point log samples to point_log table, in each transaction only insert log samples for the one point_id, and the log samples are already sorted ascendingly. That means the all the log samples data in a transaction is in the same order for the index( point_log_idx1), how can I make SQL Server to take advantage of this, to avoid the the tree search cost?

A SQL Server table never is "sorted" in any way. The only way to get something sorted is by using a `SELECT` with an **explicit** `ORDER BY` statement. — marc_s, Dec 30 '14 at 08:29

score 0 · Answer 1 · edited May 23 '17 at 10:33

This looks like a good opportunity for changing the clustered index on Point_Log to cluster by its parent point_id Foreign Key:

CREATE TABLE Point_log
( 
    point_log_id int PRIMARY KEY NONCLUSTERED, 
    point_id int, 
    timestamp datetime, 
    value int
);

And then:

CREATE CLUSTERED INDEX C_PointLog ON dbo.Point_log(point_id);

Rationale: This will reduce the read IO on point_log when fetching point_log records for a given pointid

Moreover, given that Sql Server will add a 4 byte uniquifier to a non-unique clustered index, you may as well include the Surrogate PK on the Cluster as well, to make it unique, viz:

CREATE UNIQUE CLUSTERED INDEX C_PointLog ON dbo.Point_log(point_id, point_log_id);

The non clustered index point_log_idx1 ( point_id asc, timestamp asc) would need to be retained if you have a large number of point_logs per point, and assuming good selectivity of queries filtering on point_log.pointid & point_log.timestamp

score 0 · Accepted Answer · answered Dec 31 '14 at 02:55

The tree search cost is probably negligible compared to the cost of physical writing to disk and page splitting and logging.

1) You should definitely insert data in bulk, rather than row by row.

2) To reduce page splitting of the point_log_idx1 index you can try to use ORDER BY in the INSERT statement. It still doesn't guarantee the physical order on disk, but it does guarantee the order in which point_log_id IDENTITY would be generated, and hopefully it will hint to process source data in this order. If source data is processed in the requested order, then the b-tree structure of the point_log_idx1 index may grow without unnecessary costly page splits.

I'm using SQL Server 2008. I have a system that collects a lot of monitoring data in a central database 24/7. Originally I was inserting data as it arrived, row by row. Then I realized that each insert was a separate transaction and most of the time system spent writing into the transaction log.

Eventually I moved to inserting data in batches using stored procedure that accepts table-valued parameter. In my case a batch is few hundred to few thousand rows. In my system I keep data only for a given number of days, so I regularly delete obsolete data. To keep the system performance stable I rebuild my indexes regularly as well.

In your example, it may look like the following.

First, create a table type:

CREATE TYPE [dbo].[PointValuesTableType] AS TABLE(
    point_id int,
    timestamp datetime,
    value int
)

Then procedure would look like this:

CREATE PROCEDURE [dbo].[InsertPointValues]
    -- Add the parameters for the stored procedure here
    @ParamRows dbo.PointValuesTableType READONLY
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    BEGIN TRANSACTION;
    BEGIN TRY

        INSERT INTO dbo.point_log
            (point_id
            ,timestamp
            ,value)
        SELECT
            TT.point_id
            ,TT.timestamp
            ,TT.value
        FROM @ParamRows AS TT
        ORDER BY TT.point_id, TT.timestamp;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        ROLLBACK TRANSACTION;
    END CATCH;

END

In practice you should measure for your system what is more efficient, with ORDER BY, or without. You really need to consider performance of the INSERT operation as well as performance of subsequent queries.

It may be that faster inserts lead to higher fragmentation of the index, which leads to slower queries.

So, you should check the fragmentation of the index after INSERT with ORDER BY or without. You can use sys.dm_db_index_physical_stats to get index stats.

Returns size and fragmentation information for the data and indexes of the specified table or view in SQL Server.

Insert sorted data to a table with nonclustered index

2 Answers2