5

Overview

This question is a more specific version of this one:

But I've noticed the same performance hit for other data types (and, in fact, in my case I'm not using any bigint types at all).

Here are some other questions that seem like they should cover the answer to this question, but I'm observing the opposite of what they indicate:

Context

I've got some C# code for inserting data into a table. The code is itself data-driven in that some other data specifies the target table into which the data should be inserted. So, tho I could use dynamic SQL in a stored procedure, I've opted to generate dynamic SQL in my C# application.

The command text is always the same for row I insert so I generate it once, before inserting any rows. The command text is of the form:

INSERT SomeSchema.TargetTable ( Column1, Column2, Column3, ... )
VALUES ( SomeConstant, @p0, @p1, ... );

For each insert, I create an array of SqlParameter objects.

For the 'nvarchar' behavior, I'm just using the SqlParameter(string parameterName, object value) constructor method, and not setting any other properties explicitly.

For the 'degenerate' behavior, I was using the SqlParameter(string parameterName, SqlDbType dbType) constructor method and also setting the Size, Precision, and Scale properties as appropriate.

For both versions of the code, the value either passed to the constructor method or separately assigned to the Value property has a type of object.

The 'nvarchar' version of the code takes about 1-1.5 minutes. The 'degenerate' or 'type-specific' code takes longer than 9 minutes; so 6-9 times slower.

SQL Server Profiler doesn't reveal any obvious culprits. The type-specific code is generating what would seem like better SQL, i.e. a dynamic SQL command whose parameters contain the appropriate data type and type info.

Hypothesis

I suspect that, because I'm passing an object type value as the parameter value, the ADO.NET SQL Server client code is casting, converting, or otherwise validating the value before generating and sending the command to SQL Server. I'm surprised tho that the conversion from nvarchar to each of the relevant target table column types that SQL Server must be performing is so much faster than whatever the client code is doing.

Notes

I'm aware that SqlBulkCopy is probably the best-performing option for inserting large numbers of rows but I'm more curious why the 'nvarchar' case out-performs the 'type-specific' case, and my current code is fast enough as-is given the amount of data it routinely handles.

Community
  • 1
  • 1
Kenny Evitt
  • 9,291
  • 5
  • 65
  • 93
  • 1
    Did you also compare the performance for semi-bulk inserts such as these? `INSERT SomeSchema.TargetTable ( Column1, Column2, Column3, ... ) values ( SomeConstant, @p0, @p1, ... ), ( SomeConstant, @p0, @p1, ... ), ( SomeConstant, @p0, @p1, ... ), ( SomeConstant, @p0, @p1, ... );` Personally, I noted that the best performance was given with chunks of 500kb to 800kb at a time. – Ralph Mar 14 '16 at 20:31
  • @Ralph I did not. That's an interesting, and good, idea but it would be a non-trivial amount of work for me to implement. My code needs to handle existing tables that span a considerable range of 'width', i.e. some tables contain only a few columns and others contain more than 200 columns. I think I'd just bite the bullet and use `SqlBulkCopy` with a custom `IDataReader` implementation so I could { semi-bulk / chunk } insert data 'streamed' from my sources. – Kenny Evitt Mar 14 '16 at 20:43
  • 1
    Use a profiler to see what specific areas take longer. – Martin Smith Mar 14 '16 at 20:58
  • 1
    @MartinSmith That was an excellent suggestion. The Visual Studio performance Profiler seemed to indicate that some LINQ extension method in my insert method was significantly slower. After replacing the extension method, the type-specific code completed in a little less than 2 minutes. It's still 30-40% slower, but it seems very likely that my code for generating the parameters is to blame. – Kenny Evitt Mar 14 '16 at 22:09
  • What do you mean by "conversion from nvarchar to each of the relevant target table column types" ? You really should be setting max size for the NVARCHAR stuff. Also, there is a performance hit for passing datatypes that have no direct mapping: object/SQL_VARIANT, DATETIME (instead of DATETIME2), DECIMAL, Guid/UNIQUEIDENTIFIER, and a few others. – Solomon Rutzky Mar 14 '16 at 23:34
  • @srutzky The original code generates SQL for which all of the parameters are `nvarchar`. Given that most of the column data types are not `nvarchar`, SQL Server must be converting most of the parameter values when it executes the SQL. But that still seems to be very fast as the other code, the code that supplies values of the same type as the corresponding table column, is slower. Tho that seems like it's probably my fault. – Kenny Evitt Mar 15 '16 at 00:06
  • @srutzky What do you mean by "there is a performance hit for **passing** datatypes that have no direct mapping" – passing *where*? Passing to `SqlParameter` or passing to SQL Server directly? And are you claiming that `Decimal` in .NET doesn't correspond to `decimal` in SQL Server, or not directly, and that it it's faster to pass the same value as a string value using an `nvarchar` parameter instead? And that the same is true for `DateTime` and `datetime` and `Guid` and `uniqueidentifier`? – Kenny Evitt Mar 15 '16 at 00:09
  • Kenny, I never tried sending in the _wrong_ datatype ;-), so I am not making any specific claim about the implicit conversion happening within SQL Server. However, yes, I am saying that pass in, via `SqlParameter`, the specific .NET types of `Decimal`, `Guid`, object (mapped to `SQL_VARIANT`), `DateTime` (if mapped to `DATETIME` or `SMALLDATETIME`), and some others, have a very definite impact as they are not the same (i.e. different precision, capacity, structure and/or comparison method (relates to `Guid`)), so I suspect there is extra validation going on there. – Solomon Rutzky Mar 15 '16 at 01:36
  • PS, regarding `SqlBulkCopy` being the "fastest" method to get data into SQL Server, please see the following answers here on S.O.: **1)** [How can I insert 10 million records in the shortest time possible?](http://stackoverflow.com/a/25773471/577765), **2)** [Pass Dictionary to Stored Procedure T-SQL](http://stackoverflow.com/a/25815939/577765), **3)** [Storing a Dictionary or KeyValuePair in a database](http://stackoverflow.com/a/26922861/577765), and **4)** [Bulk insert strategy from c# to SQL Server](http://stackoverflow.com/a/28678916/577765) :-) – Solomon Rutzky Mar 15 '16 at 17:44
  • @srutzky Thanks but in my case I need to support arbitrary tables so I can't use a TVP. I guess I could dynamically create a TVP (and a stored procedure) ... – Kenny Evitt Mar 15 '16 at 18:35
  • @KennyEvitt Yes, you should be able to dynamically create the TVP (then drop when the process is done). But you don't need to use a Stored Procedure. A TVP can be used in an ad hoc SQL statement. – Solomon Rutzky Mar 15 '16 at 18:45

2 Answers2

0

The answer does depend on the database you are running, but it has to do with the character encoding process. SQL Server introduced the NVarChar and NText field types to handle UTF encoded data. UTF also happens to be the internal string representation for the .NET CLR. NVarChar and NText don't have to be converted to another character encoding, which takes a very short but measurable amount of time.

Other databases allow you to define character encoding at the database level, and others let you define it on a column by column basis. The performance differences really depend on the driver.

Also important to note:

  • Inserting using a prepared statement emphasizes inefficiencies in converting to the database's internal format
  • This has no bearing on how efficient the database queries against a string--UTF-16 takes up more space than the default Windows-1252 encoding for Text and VarChar.
  • Of course, in a global application, UTF support is necessary
Berin Loritsch
  • 11,400
  • 4
  • 30
  • 57
  • I think I get at what you're writing, and it certainly makes sense for inserting data into `varchar` columns. But the specific table for which I first noticed this discrepancy contains columns with lots of different data types, e.g. `decimal`, `int, `bit` – and, now that I just checked, all of its 'text' columns are of the `nvarchar` type (and not `nvarchar(MAX)`). – Kenny Evitt Mar 14 '16 at 20:39
  • All things being equal, and only nvarchar vs varchar being different, you'll see the price of encoding. I'm assuming (since you didn't provide actual performance numbers) that the difference is small, but measurable. – Berin Loritsch Mar 14 '16 at 20:45
  • I don't have any `varchar` types – so my 'type-specific' code shouldn't be doing any encoding. – Kenny Evitt Mar 14 '16 at 21:21
0

They're Not (but They're Almost as Fast)

My original discrepancy was entirely my fault. The way I was creating the SqlParameter objects for the 'degenerate' or 'type-specific' version of the code used an extra loop than the 'nvarchar' version of the code. Once I rewrote the type-specific code to use the same number of loops (one), the performance is almost the same. [About 1–2% slower now instead of 500-800% slower.]

A slightly modified version of the type-specific code is now a little faster; at least based on my (limited) testing – about 3-4% faster for ~37,000 command executions.

But it's still (a little) surprising that it's not even faster, as I'd expect SQL Server converting hundreds of nvarchar values to lots of other data types (for every execution) to be significantly slower than the C# code to add type info to the parameter objects. I'm guessing it's really hard to observe much difference because the time for SQL Server to convert the parameter values is fairly small relative to the time for all of the other code (including the SQL client code communicating with SQL Server).

One lesson I hope to remember is that it's very important to compare like with like.

Another seeming lesson is that SQL Server is pretty fast at converting text to its various other data types.

Kenny Evitt
  • 9,291
  • 5
  • 65
  • 93