B-trees, databases, sequential vs. random inserts, and speed. Random is winning

Question

EDIT

@Remus corrected my test pattern. You can see the corrected version on his answer below.

I took the suggestion of replacing the INT with DECIMAL(29,0) and the results were:

Decimal: 2133
GUID: 1836

Random inserts are still winning, even with a fractionally bigger row.

Despite explanations that indicate random inserts are slower than sequential ones, these benchmarks show they are apparently faster. The explanations I'm getting aren't agreeing with the benchmarks. Therefore, my question remains focused on b-trees, sequential inserts, and speed.

...

I know from experience that b-trees have awful performance when data is added to them sequentially (regardless of the direction). However, when data is added randomly, best performance is obtained.

This is easy to demonstrate with the likes of an RB-Tree. Sequential writes cause a maximum number of tree balances to be performed.

I know very few databases use binary trees, but rather used n-order balanced trees. I logically assume they suffer a similar fate to binary trees when it comes to sequential inputs.

This sparked my curiosity.

If this is so, then one could deduce that writing sequential IDs (such as in IDENTITY(1,1)) would cause multiple re-balances of the tree to occur. I have seen many posts argue against GUIDs as "these will cause random writes". I never use GUIDs, but it struck me that this "bad" point was in fact a good point.

So I decided to test it. Here is my code:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)

GO

declare @i int, @t1 datetime, @t2 datetime, @t3 datetime, @c char(300)

set @t1 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T2 values (NEWID(), @c)
    set @i = @i + 1
end

set @t2 = GETDATE()
WAITFOR delay '0:0:10'
set @t3 = GETDATE()
set @i = 1

while @i < 2000 begin
    insert into T1 values (@i, @c)
    set @i = @i + 1
end

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t3, getdate()) AS [GUID]

drop table T1
drop table T2

Note that I am not subtracting any time for the creation of the GUID nor for the considerably extra size of the row. The results on my machine were as follows:

Int: 17,340 ms GUID: 6,746 ms

This means that in this test, random inserts of 16 bytes was almost 3 times faster than sequential inserts of 4 bytes.

Would anyone like to comment on this?

Ps. I get that this isn't a question. It's an invite to discussion, and that is relevant to learning optimum programming.

Add a char(3000) column or char(500) column to reduce rows per page density. What happens when run 2nd time? Did DB have to grow? Then add a non-clustered index on the char column (if < 900). I'd prefer something more related to real life... — gbn, Jan 04 '11 at 20:31
I did that (char(3000)). Results: Int: 7,406; GUID: 22,286. Char(300): Int: 6630, GUID: 5,816. Why? — IamIC, Jan 04 '11 at 20:38

SQLMenace · Answer 1 · 2011-01-04T20:38:54.060

flip the operation and the int is faster..have you taken into account log and data file growth? Run each separately

declare @i int, @t1 datetime, @t2 datetime

set @t1 = GETDATE()
set @i = 1

while @i < 10000 begin
    insert into T2 values (NEWID())
    set @i = @i + 1
END


set @t2 = GETDATE()
set @i = 1

while @i < 10000 begin
    insert into T1 values (@i)
    set @i = @i + 1
end



select DATEDIFF(ms, @t1, @t2) AS [UID], DATEDIFF(ms, @t2, getdate()) AS [Int]

the problem with UUIDs is when clustering on them and not using NEWSEQUENTIALID() is that they cause page breaks and fragmentation of the table

now try like this and you see it is almost the same

declare @i int, @t1 datetime, @t2 datetime

set @t1 = GETDATE()
set @i = 1

while @i < 10000 begin
    insert into T2 values (NEWID())
    set @i = @i + 1
END
select DATEDIFF(ms, @t1, getdate()) 

set @t1 = GETDATE()
set @i = 1

while @i < 10000 begin
    insert into T1 values (@i)
    set @i = @i + 1
end



select DATEDIFF(ms, @t1, getdate())

And reversed

declare @i int, @t1 datetime, @t2 datetime



set @t1 = GETDATE()
set @i = 1

while @i < 10000 begin
    insert into T1 values (@i)
    set @i = @i + 1
end

set @t1 = GETDATE()
set @i = 1

while @i < 10000 begin
    insert into T2 values (NEWID())
    set @i = @i + 1
END
select DATEDIFF(ms, @t1, getdate())

I tried it and you are correct. I would never use a UID as a key, though. My question is really about the b-tree reordering. — IamIC, Jan 04 '11 at 20:43
"the problem with UUIDs is when clustering on them and not using NEWSEQUENTIALID() is that they cause page breaks and fragmentation of the table" - is this because of the randomness of the inserts? — IamIC, Jan 04 '11 at 20:44
yes, that is correct since it needs to make space on a page and then it does a split with a NEWSEQUENTIALID() this does not happen — SQLMenace, Jan 04 '11 at 20:46
Of course, if someone had a large varchar as a PK, and the values were somewhat random, this could be even worse. — IamIC, Jan 04 '11 at 20:55

score 3 · Answer 2 · answered Jan 04 '11 at 20:54

3

You are not measuring the INSERT speed. You are measuring your log flush performance. Since you commit after each INSERT, all those tests are doing are sitting around waiting for commit to harden the log. That is hardly relevant for INSERT performance. And please don't post 'performance' measurements when SET NOCOUNT is OFF...

So lets try this without unnecessary server-client chatter, with a properly sized data, batch commits and pre-grown databases:

:setvar dbname testdb
:setvar testsize 1000000
:setvar batchsize 1000

use master;
go

if db_id('$(dbname)') is not null
begin
    drop database [$(dbname)];
end
go

create database [$(dbname)] 
    on (name='test_data', filename='c:\temp\test_data.mdf', size=10gb)
    log on (name='test_log', filename='c:\temp\test_log.ldf', size=100mb);
go

use [$(dbname)];
go  

CREATE TABLE [dbo].[T1](
    [ID] [int] NOT NULL
 CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) 
)
GO

CREATE TABLE [dbo].[T2](
    [ID] [uniqueidentifier] NOT NULL
 CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
GO

set nocount on;
go

declare @i int, @t1 datetime, @t2 datetime

set @t1 = GETDATE()
set @i = 1

begin transaction;
while @i < $(testsize) begin
    insert into T1 values (@i)
    set @i = @i + 1
    if @i % $(batchsize) = 0
    begin
        commit;
        begin transaction;
    end
end
commit

set @t2 = GETDATE()
set @i = 1
begin transaction
while @i < $(testsize) begin
    insert into T2 values (NEWID())
    set @i = @i + 1
    if @i % $(batchsize) = 0
    begin
        commit;
        begin transaction;
    end
end
commit

select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t2, getdate()) AS [UID]

drop table T1
drop table T2

INTS: 18s
GUIDS: 23s

QED

answered Jan 04 '11 at 20:54

Remus Rusanu

288,378
40
442
569

@Remus, I didn't set SET NOCOUNT OFF. Why would that affect this benchmark, though? – IamIC Jan 04 '11 at 21:04
@IanC by default NOCOUNT is OFF, you have to make it ON explicitly – SQLMenace Jan 04 '11 at 21:07
SET NOCOUNT OFF (the default) sends after each insert the rowcount back to the client (`1 rows inserted`). So the server now has to wait for the client to consume these messages. – Remus Rusanu Jan 04 '11 at 21:08
Thanks. I haven't seen ":setvar" before. It's giving me a syntax error. Do I need to wrap it in something? – IamIC Jan 04 '11 at 21:12
You need to enable SQLCMD extensions in SSMS, see http://msdn.microsoft.com/en-us/library/ms174187.aspx – Remus Rusanu Jan 04 '11 at 21:13
1

Tools\Options\Query Execution check `By default, open new queries in SQLCMD mode` – Remus Rusanu Jan 04 '11 at 21:14
1

Also make sure Instant File Initialization is enabled, don't want to wait for 10Gb file zero-out... see http://msdn.microsoft.com/en-us/library/ms175935.aspx – Remus Rusanu Jan 04 '11 at 21:22
@Remus I had a long wait :-) You sure have some fast hardware! About 12X mine. – IamIC Jan 04 '11 at 21:29
I'm running these on a laptop (and as most laptops, it does have disk write caching enabled, true)... Still, looks like you have a really slow hardware :) – Remus Rusanu Jan 04 '11 at 21:31
@Remus I just realized I miscalculated. It happens at 5:30 am. I'm also on a laptop, 2GHZ Dual core, 5400 rpm... nothing poor or special. I'm running SQL Server 2008 Express, so I changed the size to 4GB and the count to 100k. I got 2s and 2.7s. – IamIC Jan 04 '11 at 21:38
@Remus, ok, you proved your point. However, what I don't get is that with the sequential int, there should be many tree reorders, which would translate into heavy disk I/O. This question still stands. – IamIC Jan 04 '11 at 21:40
Yes about tree reordering: there isn't any. In both cases. B-Trees are self-balancing and there is no re-ordering or re-organisation of the tree, never. The only 'overhead' is due to page splits. Sequential inserts cause fewer page splits than random inserts. In these two test, the GUIDs perform slower due to two factors: larger key size (more pages overall) and random inserts (causing page splits). If one uses NEWSEQUENTIALGUID instead on NEWID then the random inserts are eliminated, and on my tests it results to about ~21s time, the diff from INTs 18s being due to the wider key, 16 vs 4 byte – Remus Rusanu Jan 04 '11 at 21:44
Ok, please bear with me here. If a page in the b-tree fills, surely it has to be split and all relevant pointers updated, just the same as it would in RAM in C? Otherwise how can it remain balanced? – IamIC Jan 04 '11 at 21:52
Lets see if it fits in 600 chars: both random insert and sequential insert *may* cause a page split. Sequential insert will need a new page for the new key but *no keys have to be moved*. This makes it a cheap operation: allocate a new page, modify the previous last page `next` 'pointer' to point to the new page, done. For a random insert a page split will *have move keys*. So is much more expensive: allocate a new page, copy about half of the rows from the old page to the new one, then fix the `next` link on the previous page *and* the `prev` link on the next page. More work, more time. – Remus Rusanu Jan 04 '11 at 22:31
Both sequential and random inserts triggers page splits will also have to insert a key on the next, upper, level. But this insert is *exactly* as the process explained before: a sequential leaf-page insert will trigger a sequential upper-page insert, and a random one will trigger a random one. Same cost considerations apply. So you are true that a page fill will have to be split and all pointers updated etc. But the work done for a random page split is *significantly* more than a sequential insert triggered page split. Mostly because in the sequential case *no rows have to be moved*. – Remus Rusanu Jan 04 '11 at 22:34
btw, out of interest, adding padding of char(12) to the int table, which makes the two tables' row size equal (+/- overhead), turns the tables and the GUID wins 2160 vs 1786 ms. – IamIC Jan 04 '11 at 22:36
With INT+padding the guids have an unfair advantage: they don't have any columns in the rows, while the ints have a column. To be fair, use a DECIMAL(29,0), which will store in 17 bytes (http://msdn.microsoft.com/en-us/library/ms187746.aspx). – Remus Rusanu Jan 04 '11 at 22:39
@Remus I took your suggestion of replacing the INT with DECIMAL(29,0) and the results were: Decimal: 2133, GUID: 1836. Random inserts are still winning, even with a fractionally bigger row. Your comment "Lets see..." seems to be describing a linked list, not a tree. – IamIC Jan 05 '11 at 09:43

score 0 · Answer 3 · answered Jan 04 '11 at 20:51

0

I expect in a real database rebalancing of an index being a minor problem, because lots of index entries will fit in a single block and as long.

What might become more of an issue might be contention to that single block containing all the new entries. Oracle has a feature to store the bytes of the key in reverse order to spread new entries out over all blocks: http://oracletoday.blogspot.com/2006/09/there-is-option-to-create-index.html Don't know about other databases.

answered Jan 04 '11 at 20:51

Jens Schauder

77,657
34
181
348

I recently wanted to test something. I created a table with two int columns, the 1st PK IDENTITY(1,1), and the second had its own index. It took just over 24 hours to insert 16M records... that's < 200 a second. This horrible performance is what prompted this post. – IamIC Jan 04 '11 at 20:58
perhaps Oracle does that to avoid the issue I'm mentioning. – IamIC Jan 04 '11 at 20:59
The last page insert latch contention would only be visible on high-end systems (16-32 CPUs and higher). See http://sqlcat.com/technicalnotes/archive/2009/09/22/resolving-pagelatch-contention-on-highly-concurrent-insert-workloads-part-1.aspx – Remus Rusanu Jan 04 '11 at 21:04
1

@IanC: 200 inserts per second is most likely due to lack of batch commit. Around 200-350 log flushes per second is what a normal IO subsystem can drive. You *must* commit in batches to achieve any performance on ETL. Normal OLTP loads benefit from many concurrent users and the cost of log flush is amortized over many concurrent requests, so is not such a big issue. – Remus Rusanu Jan 04 '11 at 21:07
@Remus I saw that from your code, thanks. I almost never run such inserts, so I never knew the benefits of what you're saying. But it's noted now. – IamIC Jan 04 '11 at 21:14

B-trees, databases, sequential vs. random inserts, and speed. Random is winning

3 Answers3

Linked