2

We've come across a perplexing problem with our database application. We're running the application in 3 different (but similarly spec'd) environments.

        Select  Insert
Env1    0.81ms  0.94ms
Env2    0.69ms  5.79ms
Env3    0.52ms  15.61ms

We benchmarked disk and memory performance in the 3 environments. Env3 is slightly faster than Env2 which is slightly faster than Env1 (by no more than 10%) As you might expect, Selects in Env3 are slightly faster than in Env2 which are slightly faster than in Env1. However, Inserts in Env1 are 5 times quicker than in Env2 and 15 times quicker than in Env3. What could possibly account for the very poor performance of writes in Env2 and Env3?

Some background information in case it's relevant.

  1. The figures above were gathered by a test application we wrote to try and diagnose the problem. The Selects & Inserts are basic and performed upon a single small table.

  2. Our test application...

    a) was written in C# and uses the System.Data.SqlClient.SqlConnection based classes.
    b) calculated the average times by carrying out the action 1000 times and dividing the time taken by 1000.
    c) worked in 2 modes...
    i. Our code called the insert/select/update 1000 times
    ii. Our code called a single stored procedure which called the insert/select/update 1000 times
    d) runs on the same machine as the database

  3. Updates suffered similarly to Inserts except when the stored procedure called the update. When the stored procedure called the update in Env3 it was 4 times slower than Env1, but when the client called the 1000 updates it was 17 times slower. The below expanded table illustrates the difference. environments.

          Select  SP Select  Insert   SP Insert  Update   SP Update  
    Env1  0.81ms  0.015ms     0.94ms    0.72ms    0.83ms   0.04ms  
    Env2  0.69ms  0.019ms     5.79ms    5.32ms    7.42ms   0.09ms  
    Env3  0.52ms  0.020ms    15.61ms   14.37ms   15.27ms   0.15ms  
    
  4. We set up 'Environment 1'. Our client set up 'Environment 2 & 3'. Environment 2 is a VMWare image. When we first noticed how slow inserts and updates were, we brought a copy of the image to our hosting centre. The performance problems disappeared.

  5. Each environment is running SQL Server 2005 standard edition (64bit) on Windows Server 2008 (64 bit)

  6. Major differences between the environments are

    Env1: Running in Hypervisor on a Dell T610 with 2 quad core Intel Xeon E5520 Processors. (The VM has 3 dedicated cores and 6Gig of dedicated RAM.)
    Env2: Running on VMWare VSphere on a HP Proliant DL585 G2 with 4 Dual Core 2.8Ghz AMD Opteron 8220 processors. (The VM has 3 dedicated cores and 6Gig of dedicated RAM.)
    Env3: Running on physical box - HP Proliant DL380 G5, 2 * Dual Core Intel Xeon 3Ghz processors, 6GB RAM.
    Env1 uses RAID0, Env2 & 3 use RAID5
    When we copied the Env2 image to our server, we ran it using VMPlayer. We could not replicate the performance problems.
    Env1 runs on our network. Env2 & 3 run on our clients network.

  7. We tried Shared Memory, Named Pipes and TCP as the communication mechanism with no appreciable difference.

  8. We’ve tried rebuilding indexes, dropping and recreating tables etc. The poor performance happens for all inserts/update on all tables.

We have 2 questions...

1) What could possibly account for the very poor performance of writes in Env2 and Env3? 2) What tool can we used to see where all those extra milliseconds are being used up?

Thanks for any help you can give us on this one.

6 Answers6

1

This is what I would investigate: "Env1 uses RAID0, Env2 & 3 use RAID5"

RAID 5 is slower on writes.

Remember the database is the interface to the disk drives, the disks are far more important, especially on a small table, then RAM or CPU.

Try simpler setups and vary the RAID settings, even possibly trying no RAID for a complete set of data.

QUICK EDIT For a more randomized test, so ensure you are testing disk and not RAM or CPU, trying having lower RAM and CPU, and a table with 1 million or even 10 million rows, where you select rows randomly to make increase chances the server has to go to disk for each and every row.

1

Food for thought: SELECTs are served from memory. INSERTs must flush the log disk. Check the sys.dm_io_virtual_file_stats on the 3 environments. I expect you'll discover dramatic differences in the io_stall_write_ms column.

Remus Rusanu
  • 8,283
  • 1
  • 21
  • 23
0

The two servers using RAID 5 is definitely a concern (RAID 5 is fast read/slow writes). You would get better performance from just about any other RAID configuration (though RAID 0 has no fault tolerance).

Change the RAID and you will likely get the performance you want, but I thought I would add something else that I don't believe has been said yet:

Indexes, while great for selects, affect performance of any write operation in a negative way (you write data, and the indexes must also be updated). So even if swapping out the RAID does not give you the performance you are expecting, I would look at indexes next.

  • Indexes always represent a tradeoff in terms of reads/writes but don't forget that it's possible for INSERTS to use an appropriate index – Mitch Wheat Jan 26 '11 at 23:21
0

Run Glenn Berry's query to get an idea of each database's read/write IO stalls:

-- Calculates average stalls per read, per write, and per total input/output for each
-- database file. 
-- Helps determine which database files on the entire instance have the most I/O
-- bottlenecks

SELECT DB_NAME(fs.database_id) AS [Database Name], mf.physical_name, io_stall_read_ms, num_of_reads,
CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],io_stall_write_ms, 
num_of_writes,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms],
io_stall_read_ms + io_stall_write_ms AS [io_stalls], num_of_reads + num_of_writes AS [total_io],
CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1)) 
AS [avg_io_stall_ms]
FROM sys.dm_io_virtual_file_stats(null,null) AS fs
INNER JOIN sys.master_files AS mf
ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);
Mitch Wheat
  • 466
  • 3
  • 10
0

The very first thing i would do is ensure that the indexes are exactly the same in each environment, and see what the fill factor is for each database server. Also, is the amount of data in each environment the same, or close?

Thirster42
  • 354
  • 1
  • 2
  • 14
0

I see a lot of questions about the disks and timings. I'd like to know what the disks are actually doing. What is their average access time? How long does an i/o request have to wait before it has to be serviced? How many i/o requests are queued up? Of course there are also questions about how busy the cpu is or what's happening with your memory subsystem. All these questions can be easily answered by running collectl in another window while you're performing your tests. I'd claim changing a few settings and rerunning tests looking for what settings make a difference could help, but you really need to get to the root cause.

For example, someone mentioned raid0 va raid5. That is indeed a biggie. How do these two subsystems compare in the absence of your workload? Try doing simple reads/writes to each with a tool like dd or dt or even iozone and then observe what's happening with a tool like collectl or even iostat. If there is a big difference between the performance of these 2 (and I know raid0 will be faster on writes, raid5 should win on reads) you can do all the tuning you like to the DB and you're never going to exceed what the disks can do.

-mark

mark seger
  • 31
  • 1