18

Ok so say I have 100 rows to insert and each row has about 150 columns (I know that sounds like a lot of columns, but I need to store this data in a single table). The inserts will occur at random, (ie whenever a set of users decide to upload a file containing the data), about a 20 times a month. However the database will be under continuous load processing other functions of a large enterprise application. The columns are varchars, ints, as well as a variety of other types.

Is the performance gain of wrapping these inserts in a transaction (as opposed to running them one at a time) going to be huge, minimal, or somewhere in between?

Why?

EDIT: This is for Sql Server 2005, but I'd be interested in 2000/2008 if there is something different to be said. Also I should mention that I understand the point about transactions being primarily for data-consistency, but I want to focus on performance effects.

Maksym Gontar
  • 22,765
  • 10
  • 78
  • 114
Mark Rogers
  • 96,497
  • 18
  • 85
  • 138
  • 1
    Between 2000/2008, no there isn't much of a difference that is noteworthy. Performance gains and transaction consistency will almost always be at odds with each other. The schema design, index architecture, and the storage methodologies/best practices will determine your performance gains. First rule of thumb when dealing with data, make your data requests and actions as small and as short as possible (OLTP). Then adapt accordingly. Ruling stands, don't put large record activity in transactions, transaction ARE for consistency and not for "rolling back" large batches of information. – SnapJag Apr 21 '10 at 00:45

6 Answers6

22

It can be an impact actually. The point of transactions is not about how many you do, it's about keeping the data update consistent. If you have rows that need to be inserted together and are dependent on each other, those are the records you wrap in a transaction.

Transactions are about keeping your data consistent. This should be the first thing you think about when using transactions. For example, if you have a debit (withdrawl) from your checking account, you want to make sure the credit (deposit) is also done. If either of those don't succeed, the whole "transaction" should be rolled back. Therefore, both actions MUST be wrapped in a transaction.

When doing batch inserts, break them up in to 3000 or 5000 records and cycle through the set. 3000-5000 has been a sweet number range for me for inserts; don't go above that unless you've tested that the server can handle it. Also, I will put GOs in the batch at about every 3000 or 5000 records for inserts. Updates and deletes I'll put a GO at about 1000, because they require more resources to commit.

If your doing this from C# code, then in my opinion, you should build a batch import routine instead of doing millions of inserts one at a time through coding.

SnapJag
  • 797
  • 4
  • 14
  • I am trying to do millions of inserts with checks. That is, if a row exists in destination table, then don't insert it. I was thinking of putting each insert in its own transaction. Could this be a serious problem ? Thanks. – Steam Feb 25 '14 at 01:34
  • 2
    @Steam How about MERGE INTO ? – Julia Hayward Jul 10 '14 at 10:18
  • @steam Use "batching" for large amounts; or keep it to single, normal, atomic transactions (like in my answer). If your source of data resides outside SQL get the data into a staging table fast (no indexes/checks/etc), and using batching; then move that into other tables that may have checks. Otherwise, disable checks and indexes until after. KEY NOTE: avoid implicit transaction, explicit transaction should be rule of thumb... https://technet.microsoft.com/en-us/library/2009.02.logging.aspx – SnapJag Oct 13 '16 at 00:55
  • @JuliaHayward MERGE INTO can be a bit heavy, and actually has had some known issues with how it deals with consistency (https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/) So, be cautious and test the results; simple uses are fine, more complex uses of MERGE can get you stuck. So, maybe use normal statements as much as possible, especially with VLDB and large batch processing. We're getting into to forks of ideas. 1) for transactions, 2) batch processing lots of records. – SnapJag Oct 13 '16 at 00:58
17

While transactions are a mechanism for keeping data consistent they actually have a massive impact on performance if they are used incorrectly or overused. I've just finished a blog post on the impact on performance of explicitly specifying transactions as opposed to letting them occur naturally.

If you are inserting multiple rows and each insert occurs in its own transaction there is a lot of overhead on locking and unlocking data. By encapsulating all inserts in a single transactions you can dramatically improve performance.

Conversely if you have many queries running against your database and have large transactions also occurring they can block each other and cause performance issues.

Transactions are definitively linked with performance, regardless of their underlying intent.

undefined
  • 33,537
  • 22
  • 129
  • 198
  • 1
    I am trying to do millions of inserts with checks. That is, if a row exists in destination table, then don't insert it. I was thinking of putting each insert in its own transaction. After reading this, I won't just in case. – Steam Feb 25 '14 at 01:32
  • @KenWhite's answer says that transactions do not affect performance. So, I am still not sure which one is correct. – Steam Feb 25 '14 at 01:33
  • 1
    @blasto the best thing you can do is to try it out and see for yourself. When i did the testing i found it halved the time by doing many inserts in a single transaction as opposed to the same number of inserts each with its own transaction. cf http://blog.staticvoid.co.nz/2012/04/making-dapper-faster-with-transactions.html also see this question http://stackoverflow.com/questions/5091084/ado-net-sqltransaction-improves-performance – undefined Feb 25 '14 at 02:22
  • I just tried it now - In ONE transaction, I tried to insert 30K rows which were already there in the table. It is taking much longer than before. I think this happens despite having indexes on the necessary columns. – Steam Feb 25 '14 at 03:08
  • 2
    The index on the table can actually counteract the performance. On an insert, an update to the index happens (time and overhead). Maybe use EXISTS, insert and EXISTS in the same action can be hard. SHY away from doing 30K in one transaction, do 3K-5K. When indexes are updated, statistics become more stale. Sometimes, in middle of the script, perform index defrag or update stats (gently and fast 2-5% maybe). Or, if you're doing just INSERTS, consider increasing the FILLFACTOR on your table and indexes to (80 or 70) to give it some space to avoid splitting pages. It takes time, stick with it. – SnapJag Oct 13 '16 at 01:05
4

It depends on what you call huge, but it will help (it really depends on the overall number of inserts you are doing). It will force SQL Server to not do a commit after every insert, which in time adds up. With 100 inserts, you probably won't notice too much an increase depending on how often and what else is going on with the database.

kemiller2002
  • 113,795
  • 27
  • 197
  • 251
4

As others have said, transactions have nothing to do with performance, but instead have to do with the integrity of your data.

That being said, worrying about the performance one way or the other when you're only talking about inserting 100 rows of data about 20 times a month (meaning 2000 records per month) is silly. Premature optimization is a waste of time; unless you have repeatedly tested the performance impact of these inserts (as small as they are, and as infrequent) and found them to be a major issue, don't worry about the performance. It's negligible compared to the other things you mentioned as being server load.

Ken White
  • 123,280
  • 14
  • 225
  • 444
  • Yeah, that's what I thought, but I was told otherwise, which is part of the reason I asked the question. – Mark Rogers Feb 06 '09 at 20:40
  • 2
    No, apart from maintaining data integrity transactions do have an impact on the performance, if you are inserting a large number of rows together. Doing them in a single transaction can decrease the total IOPS needed. – Pratik Singhal Apr 11 '18 at 13:44
  • @PratikSinghal: Not in the context of this poster's question. The number of rows being inserted is negligible. – Ken White Apr 11 '18 at 17:17
3

Transactions are not for performance but for data-integrity. Depending on the implementation there will be real no gain/loss of performance for only 100 rows (they just will be logged additionally, so they can all be rolled back).

Things to consider about the performance issues:

  • TAs will interact with other queries
    • writing TAs will lock tuples/pages/files
  • commits just might be (depending on lock protocol) update of a timestamp
  • more logs might be written for TAs (one should be able to roll TAs back, but the DB might log extensively already, sequential logging is cheap)
  • the degree of isolation (I know that one can switch this level in some DBs - and that nearly nobody uses level 3)

All in all: use TAs for ensuring the integrity.

Leonidas
  • 2,440
  • 15
  • 22
0

practically - extremely. with large inserts, 100++ (provided that you configured mysql to have increased query size and transaction size to support monstrous queries/transactions, sorry don't remember exact variable names) - insert times can commonly be 10 times as fast and even much more

Marius Gri
  • 59
  • 3