0

I am working on MS SQL server.

I have a table called "User" with three columns and default index which is created with Primary Key of the table, of UserId.

I have a word file that contains user information line by line. There are almost 10000 lines.

I have a program that reads the user information from the word file and inserts it into the database. It is written with C# in visual studio. The program uses repository and unitofwork pattern.

The program workflow is as follows: 1) read a single line of user information from the word file. 2) create an user object based on the information 3) write the object to the repository 4) commit the work, in which the database insert statement executes.

Basically, the program executes "insert statement" each time it reads user information from the word file.

Here is my question.

I logged the time of each "insert statement", and I can see that the "insert statement" takes longer as there are more data inserted. Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?

Please enlighten me what happens after and before the insert statement in SQL database.

Thank you Guys.

kisung Tae
  • 217
  • 5
  • 19
  • is UserId auto increament ? what about length and data type of other columns ? both point are important to answer it. – KumarHarsh Aug 09 '18 at 11:53
  • @KumarHarsh Sorry for the late reply and thank you for your comment. Yes the UserId is auto increment. The data type of other columns are just normal nvarchar. The two columns are address and phone number so pretty much every user has the similar length of data in the two columns. – kisung Tae Aug 09 '18 at 22:50

2 Answers2

0

Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?

No.In fact USERID autoincrement as clustered index is ideal choice for CI.

Since PK candidate is auto-incremented, data will be always appended in last page.

However in case of an Update statement page split can happen provided address is of large length than earlier.

If possible make Address to varchar and as narrow as possible.

Main problem is very frequent insert, very frequent database hit. If there are 1000 records to be inserted then create UDT and insert create 50/100 at a time.You can so by applying Paging logic in insert method.It is easy and will be helpful.

Optimise your UI layer code like use Connection Pooling, keep relevant data type and pass length of variable in DAL(Sql parameter).

I logged the time of each "insert statement", and I can see that the "insert statement" takes longer as there are more data inserted. Is this because the database has more data to sort out in B tree since a table has default clustered index on its primary key?

No since userid is ever increasing. No sorting work happen. May be there is fault in `insert sql script'. Main culprit is very frequent database hit.

Please enlighten me what happens after and before the insert statement in SQL database.

Please enlighten me what happens after and before the insert statement in SQL database.

Whenever data is inserted,insert take placein two places. At table level in Data pages ,and in index level.

Clustered index stores the actual data rows of the table at the leaf level of the index, in addition to controlling the sorting criteria of the data within the data pages and the order of the pages itself, based on the clustered index key.

Index Page split will happen. How ? Suppose There are 3 intermerdiate level and 4 leaf level. For example now if you insert 1 record,2 records nothing will happen.Insertion process at the stage will be fast.

Suppose you insert few more record (say after 10,20) then Intermediate lebel and leaf level will both increase.Becasue index page has space limit so when it will no

longer be able to accommodate new record then it will split page to accomadate new records.Becasue of this reason column length should be as narrarow as possible.

But in your case clustered index do not have to do Sorting criteriA.So one less work perform by Clusetered index.

Also Index page split cost will be less than non auto increment key or wide key.

since you are very frequently inserting records,it will impact your performance every now and then.

In case of Bulk insert Index page split will be less, so performance will improve.

In HEAP Table,since there is no clustered Index to maintain it has one less task to do.So very frequent Insert might improve.

But you have to decide Insert performance vs Select performance.

If this table is very frequntly use to fetch record then you hv to keep clustered index. If it very seldom use or record is less than 100 HEAP table is ok.

further reading,

index structure and concepts

Index Architecture and Design Guide

Heaps (Tables without Clustered Indexes)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
  • 1
    Thank you for your response. It has helped me understand the database better. Just one quick question, "Data is always appended in the last page" means that the data is always appended as a leaf node of B-tree? – kisung Tae Aug 10 '18 at 04:52
  • Because I assume that even if the userId is auto increment, the database should look through its B-tree based on the userId and find out where to put the data. – kisung Tae Aug 10 '18 at 04:54
  • Sorry if I am misundersanding thank you for you help anyway – kisung Tae Aug 10 '18 at 04:54
  • Thank you very much for your quick response and take your time please. Just in my understanding of B-tree with a Clustered Index, every time data is appended, the B-tree should be adjusted. Anyway, thank you. – kisung Tae Aug 10 '18 at 05:08
0

If your Word document includes the UserId (PRIMARY KEY) which then gets inserted into the table, I can see why this would be EXTRA slow.

Understanding CLUSTERED vs NON-Clustered Indexing.

In a CLUSTERED INDEX the physical rows of each table are re-arranged to according to the index. To use an every-day analogy, it is like arranging books (records) in a bookshelf in alphabetical order. Every time a new book comes in, you have to physically rearrange the other books so that the alphabetical index is properly maintained. Obviously this is VERY SLOW for inserts but really fast for SELECTS.

A Non-Clustered Index on the other hand does not alter the physical rows in a table when a new record comes in. Using the bookshelf as an analogy, if you wanted to look up books by author, you would keep a piece of paper on the side as an "index card" to look up the location of the book in the shelf that matches a specific author.

My solution to your problem if you're going to insert a large number of records at once is this:

  1. DROP THE INDEX
  2. INSERT THE DATA IN BULK
  3. RE-CREATE THE INDEX
Keith Harris
  • 1,118
  • 3
  • 13
  • 25