11

What is the fastest way to do this:

  • One table, no references that I cannot prefill (i.e. there is one reference key there, but i have all the data filled in)
  • LOTS of data. We talk of hundreds of millions of rows per day, coming in dynamically through an API
  • Requests must / should be processed as soon as feasible in a near real time scenario (i.e. no writing out to a file for upload one per day). 2 seconds is the normal maximal delay
  • Separate machines for data / application and the SQL Server

What I do now:

  • Aggregate up to 32*1024 rows into an array, then queue it.
  • Read the queue in 2-3 threads. Insert into database using SqlBulkCopy.

I get about 60k-75k rows imported per second, which is not enough, but quite close. I would love to hit 250.000 rows.

So far nothing is really used. I get 20% time "network I/O" blocks, have one core 80% loaded CPU side. Discs are writing out 7mb-14mb, mostly idle. Average queue length on a RAID 10 of 6 raptors is.... 0.25.

Anyone any idea how to speed this up? Faster server (so far it is virtual, 8gb ram, 4 cores, physical disc pass through for data).


Adding some clarifications:

  • This is a 2008 R2 Enterprise SQL Server on a 2008 R2 server. machine has 4 cores, 8gb ram. All 64 bit. The 80% load average comes from this machine showing about 20% cpu load.
  • The table is simple, has no primary key, only an index on a relational reference (instrument reference) and a unique (within a set of instruments, so this is not enforced) timestamp.
  • The fields on the table are: timestamp, instrument reference (no enforced foreign key), data type (char 1, one of a number of characters indicating what data is posted), price (double) and volume (int). As you can see this is a VERY thin table. The data in question is tick data for financial instruments.
  • The question is also about hardware etc. - mostly because i see no real bottleneck. I am inserting in multiple transactions and it gives me a benefit, but a small one. Discs, CPU are not showing significant load, network io wait is high (300ms/second, 30% at the moment), but this is on the same virtualization platform which runs JSUT the two servers and has enough cores to run all. I pretty much am open to "buy another server", but i want to identify the bottleneck first.... especially given that at the end of the day I am not grabbing what the bottleneck is. Logging is irrelevant - the bulk inserts do NOT go into the data log as data (no clustered index).

Would vertical partitioning help, for example by a byte (tinyint) that would split the instrument universe by for example 16 tables, and me thus doing up to 16 inserts at the same time? As actually the data comes from different exchanges I could make a partition per exchange. This would be a natural split field (which is actually in instrument, but I could duplicate this data here).


Some more clarifications: Got the speed even higher (90k), now clearly limited by network IO between machines, which could be VM switching.

What I do now is do a connection per 32k rows, put up a temp table, insert into this with SqlBUlkdCopy, THEN use ONE sql statement to copy to main table - minimizes any lock times on the main table.

Most waiting time is now still on network IO. Seems I run into issues where VM wise. Will move to physical hardware in the next months ;)

BenMorel
  • 34,448
  • 50
  • 182
  • 322
TomTom
  • 61,059
  • 10
  • 88
  • 148
  • Hell, are you really sure storing that data in a relational DB is the solution you really need? Can't you store the data at first in some kind of log files, and when you are going to analyse the data, run some kind of aggregate process to it extracting only relevant information to your database? – Doc Brown Jan 20 '11 at 21:07
  • Yes, but i would love not to. There is a LOT of stuff going on here, and it is also a good programming example. Also, when extracing logs for active use I may have to proces 1-2 billion rows asap from compressed binary format into relational data. Just tying to get the limits here. – TomTom Jan 20 '11 at 21:17
  • This is especially true because at the end I dont really see why it is not inserting faster. Even one core is not used up, discs are not, and I have network I/O as wait conditions. I dont transfer a lot of data. This is a little thing for me to think about... ;) and fix. – TomTom Jan 20 '11 at 21:18
  • Looks like you're making good progress here! Please do consider trying the "table parameter insert" method; I can easily insert hundreds of thousands of rows per second on a moderately optimized database with no trouble using it. – Ted Spence May 08 '13 at 05:29

6 Answers6

4

If you manage 70k rows per second, you're very lucky so far. But I suspect it's because you have a very simple schema.

I can't believe you ask about this kind of load on

  • virtual server
  • single array
  • SATA disks

The network and CPUs are shared, IO is restricted: you can't use all resources. Any load stats you see are not very useful. I suspect the network load you see is traffic between the 2 virtual servers and you'll become IO bound if you resolve this

Before I go on, read this 10 lessons from 35K tps. He wasn't using a virtual box.

Here is what I'd do, assuming no SAN and no DR capability if you want to ramp up volumes.

  • Buy 2 big phyical servers, CPU RAM kind of irreleveant, max RAM, go x64 install
  • Disks + controllers = fastest spindles, fastest SCSI. Or a stonking great NAS
  • 1000MB + NICs
  • RAID 10 with 6-10 disk for one log file for your database only
  • Remaining disk RAID 5 or RAID 10 for data file

For reference, our peak load is 12 million rows per hour (16 core, 16GB, SAN, x64) but we have complexity in the load. We are not at capacity.

gbn
  • 422,506
  • 82
  • 585
  • 676
  • Sorry, IO Is unrestricted. Single array sata discs is not so little if the discs are 10k raptors, there are 10 discs only used by the sql server AND the discs are not busy by information of the raid controller. THis IS a network issue - wait times sclearly point at network IO ;) – TomTom Feb 10 '11 at 09:52
2

From the answers I read here, it seems you really have a hardware problem rather than a code problem. Ideally, you'll get your performance boosts by making available more disk I/O or network bandwidth, or by running the program on the same virtual machine that hosts the database.

However I do want to share the idea that table parameter inserts are really ideal for big data transfer; although SqlBulkCopy appears to be just as fast, it's significantly less flexible.

I wrote an article about the topic here: http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/

The overall answer is that you roughly want to create a table type:

CREATE TYPE item_drop_bulk_table_rev4 AS TABLE (
    item_id BIGINT,
    monster_class_id INT,
    zone_id INT,
    xpos REAL,
    ypos REAL,
    kill_time datetime
)

Then, you create a stored procedure to copy from the table parameter into the actual table directly, so there are fewer in-between steps:

CREATE PROCEDURE insert_item_drops_rev4
    @mytable item_drop_bulk_table_rev4 READONLY
AS

INSERT INTO item_drops_rev4 
    (item_id, monster_class_id, zone_id, xpos, ypos, kill_time)
SELECT 
    item_id, monster_class_id, zone_id, xpos, ypos, kill_time 
FROM 
    @mytable

The SQL Server code behind looks like this:

DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(Int64)));
dt.Columns.Add(new DataColumn("monster_class_id", typeof(int)));
dt.Columns.Add(new DataColumn("zone_id", typeof(int)));
dt.Columns.Add(new DataColumn("xpos", typeof(float)));
dt.Columns.Add(new DataColumn("ypos", typeof(float)));
dt.Columns.Add(new DataColumn("timestamp", typeof(DateTime)));

for (int i = 0; i < MY_INSERT_SIZE; i++) {
    dt.Rows.Add(new object[] { item_id, monster_class_id, zone_id, xpos, ypos, DateTime.Now });
}

// Now we&#039;re going to do all the work with one connection!
using (SqlConnection conn = new SqlConnection(my_connection_string)) {
    conn.Open();
    using (SqlCommand cmd = new SqlCommand("insert_item_drops_rev4", conn)) {
        cmd.CommandType = CommandType.StoredProcedure;

        // Adding a "structured" parameter allows you to insert tons of data with low overhead
        SqlParameter param = new SqlParameter("@mytable", SqlDbType.Structured);
        param.Value = dt;
        cmd.Parameters.Add(param);
        cmd.ExecuteNonQuery();
    }
}
Ted Spence
  • 2,598
  • 1
  • 21
  • 21
1

It is all slow.

Some time ago we solved a similar problem (insert into DB tens of thousands price data, as I remember it was about 50K per time frame, and we had about 8 time frames that all clashed at :00, so it was about 400K records) and it worked very-very fast for us (MS SQL 2005). Imagine how it will work today (SQL 2012):

<...init...>
if(bcp_init(m_hdbc, TableName, NULL, NULL, DB_IN) == FAIL)
    return FALSE;

int col_number = 1;

// Bind columns
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.SymbolName, 0, 16, (LPCBYTE)"", 1, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Time, 0, 4, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Open, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.High, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Low, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Close, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Volume, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;


<...save into sql...>
BOOL CSymbolStorage::Copy(SQL_SYMBOL_DATA *sd)
{
    if(!m_bUseDB)
        return TRUE;

    memcpy(&m_sd, sd, sizeof(SQL_SYMBOL_DATA));

    if(bcp_sendrow(m_hdbc) != SUCCEED)
        return FALSE;

    return TRUE;
}
Dima
  • 568
  • 4
  • 7
1

Are there any indexes on the table that you could do without? EDIT: asking while you were typing.

Is it possible to turn the price into an integer, and then divide by 1000 or whatever on queries?

Tim
  • 5,371
  • 3
  • 32
  • 41
  • Actually there is a chance to make price an int and a tinyint for the coding type - that is thought of. As price is not indexed, should it make a big difference? WIll give it a try, though. – TomTom Jan 20 '11 at 14:15
  • Where is the .NET client executing? – Tim Jan 20 '11 at 22:57
  • Same machine, separate VM. Moving to physical hardware soon - not for perofmrmance, but i Havean issue with the clock "moving" in a VM, which is bad if the data you get has 25ms timestamp increments and the library getting it marks data as "suspicuious" for fluctuating timestamp delays. – TomTom Feb 10 '11 at 09:53
  • 1
    Not sure what your plans are, but if/when client and server processes run on separate machines then, of course, LAN latency may become the biggest performance bottleneck, though you might be able to tune performance by sizing datarows to make most efficient use of packet size. For high speed bulk loads, it would be better to have the client and server running on the same machine. In my experience, there's always plenty of CPU on the client and the server, and it's the disk subsystem and/or the transport layer where slowdowns occur. – Tim Feb 10 '11 at 12:07
1

Have you tried adding a pk to the table? Does that improve the speed?

There is also a set-based way to use tally tables to import csv data from http://www.sqlservercentral.com/articles/T-SQL/62867/ (near bottom, requires free registration but worth it).

You might like to try that and test its performance ... with a small tally properly indexed tally table.

Ruskin
  • 5,721
  • 4
  • 45
  • 62
  • pk not added.... will try it. csv is bad, i dont want the app to be able to manipulate the server file system. – TomTom Jan 20 '11 at 14:16
  • 2
    Usualy adding any form of index or primary key will slow the insert rate down. – andora Jun 21 '11 at 21:53
0

Could you use horizontal partitioning? See: http://msdn.microsoft.com/en-us/library/ms178148.aspx & http://msdn.microsoft.com/en-us/library/ms188706.aspx

You might also want to look at this question, and possibly change the recovery model: Sql Server 2008 Tuning with large transactions (700k+ rows/transaction)

Some questions: What edition of SQL Server are you using?

Why is the one core at 80%? That might be the bottleneck, so is probably something worth investigating.

What OS are you using, and is it 64 bit?

Community
  • 1
  • 1
Bravax
  • 10,453
  • 7
  • 40
  • 68
  • Editing question in response. Note that the 80% core is an average taken from system load. The machine has 4 cores and shows 20% cpu load average. – TomTom Jan 20 '11 at 13:52