7

The DB I use has French_CI_AS collation (CI should stand for Case-Insensitive) but is case-sensitive anyway. I'm trying to understand why.

The reason I assert this is that bulk inserts with a 'GIVEN' case setup fail, but they succeed with another 'Given' case setup.

For example:

  • INSERT INTO SomeTable([GIVEN],[COLNAME]) VALUES ("value1", "value2") fails, but
  • INSERT INTO SomeTable([Given],[ColName]) VALUES ("value1", "value2") works.

EDIT Just saw this:

http://msdn.microsoft.com/en-us/library/ms190920.aspx

so that means it should be possible to change a column's collation without emptying all the data and recreating the related table?

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
Skippy Fastol
  • 1,745
  • 2
  • 17
  • 32
  • I think you need to illustrate with a simple (2 column) example of what works and what fails. I think you might be saying `INSERT INTO SomeTable([GIVEN],[COLNAME]) VALUES (...)` fails but `INSERT INTO SomeTable([Given],[ColName]) VALUES (...)` works. If so, that gets into a very delicate area of the interaction between delimited identifiers (normally case-sensitive) and collations (this one is case-insensitive). – Jonathan Leffler May 03 '12 at 14:26
  • @JonathanLeffler : your example is exactly related to my problem. – Skippy Fastol May 03 '12 at 14:29
  • You need to distinguish between what is stored in the table (the data) and the names stored in the system catalog (meta-data). If, as you indicate, the problem is with the column names (not the data itself), then you need to research whether the SQL column names in a database are case-sensitive when delimited. It may also depend on how the CREATE TABLE statement is written (were the names delimited in that?). Normally, SQL is case-insensitive on column and table names; you could write `INSERT INTO SoMeTaBlE(GiVeN, cOlNaMe) VALUES("v1", "v2")` and if the names were never delimited, it'd be OK. – Jonathan Leffler May 03 '12 at 14:38
  • @Jonathan Leffler: What does "delimited" stand for here...? – Skippy Fastol May 03 '12 at 14:56
  • In Standard SQL, a 'delimited identifier' is a column name, table name, or something similar enclosed in double quotes, such as `CREATE TABLE "table"(...)`. They are used to when names are keywords or contain characters (such as spaces) that are not normally allowed in identifiers. In SQL Server, delimited identifiers are enclosed in square brackets: `[GIVEN]` etc. MySQL uses back-ticks for the same job. This is why it is crucial that you show us what you are using. There are arcane parts to SQL (and delimited identifier handling is one of them), but when you run foul of it, you have to learn. – Jonathan Leffler May 03 '12 at 15:03
  • @JonathanLeffler: Thanks for the detailed & clear explanation. In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server. I'll have to dig into the logs. – Skippy Fastol May 03 '12 at 15:07
  • In what way do they fail? Please post the error – Nick.Mc Nov 19 '17 at 13:01
  • Hi @JonathanLeffler . There are several inaccuracies in the statements you have made here regarding how identifiers work. Please see my [answer](https://stackoverflow.com/a/44086215/577765) for the corrections (in the "ADDITIONAL NOTES" section). – Solomon Rutzky Jun 28 '18 at 14:00
  • @SkippyFastol, this answer help to understand the collation ( https://stackoverflow.com/a/2305017/102475) – antonio Jul 28 '23 at 08:24

3 Answers3

8

Given this critical piece of information (that is in a comment on the question and not in the actual question):

In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server.

it makes sense that the column names are being treated as case-sensitive, even in a case-insensitive DB, since that is how the SqlBulkCopy Class works. Please see Column mappings in SqlBulkCopy are case sensitive.

ADDITIONAL NOTES

  1. When asking about an error, please always include the actual, and full, error message in the question. Simply saying that there was an error leads to a lot of guessing and wild-goose chases that in turn lead to off-topic answers.
  2. When asking a question, please do not change the circumstances that you are dealing with. For example, the question states (emphasis added):

    bulk inserts with a 'GIVEN' case setup fail, but they succeed with another 'Given' case setup.

    Yet the example statements are single INSERTs. Also, a comment on the question states:

    In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server.

    Using .NET and SqlBulkCopy is waaaay different than using BULK INSERT or INSERT, making the current question misleading, making it difficult (or even impossible) to answer correctly. This new bit of info also leads to more questions because when using SqlBulkCopy, you don't write any INSERT statements: you just write a SELECT statement and specify the name of the destination Table. If you specify column names at all for the destination Table, it is in the optional column mappings. Is that where the issue is?

  3. Regarding the "EDIT" section of the question:

    No, changing the Collation of the column won't help at all, even if you weren't using SqlBulkCopy. The Collation of a column determines how data stored in the column behaves, not how the column names (i.e. meta-data of the Table) behaves. It is the Collation of the Database itself that determines how Database-level object meta-data behaves. And in this case, you claim that the DB is using a case-insensitive Collation (correct, the _CI_ portion of the Collation name does mean "Case Insensitive").

  4. Regarding the following statements made by Jonathan Leffler on the question:

    that gets into a very delicate area of the interaction between delimited identifiers (normally case-sensitive) and collations (this one is case-insensitive).

    No, delimited identifiers are not normally case-sensitive. The sensitivities (case, accent, kana type, width, and starting in SQL Server 2017 variation selector) of delimited identifiers is the same as for non-delimited identifiers at that same level. "Same level" means that Instance-level names (Databases, Logins, etc) are controlled by the Instance-level Collation, while Database-level names (Schemas, Objects--Tables, Views, Functions, Stored Procedures, etc--, Users, etc) are controlled by the Database-level Collation. And these two levels can have different Collations.

    you need to research whether the SQL column names in a database are case-sensitive when delimited. It may also depend on how the CREATE TABLE statement is written (were the names delimited in that?). Normally, SQL is case-insensitive on column and table names; you could write INSERT INTO SoMeTaBlE(GiVeN, cOlNaMe) VALUES("v1", "v2") and if the names were never delimited, it'd be OK.

    It does not matter if the column names were delimited or not when creating the Table, at least not in terms of how their resolution is handled. Column names are Database-level meta-data, and that is controlled by the default Collation of the Database. And it is the same for all Database-level meta-data within each Databases. You cannot have some column names being case-sensitive while others are case-insensitive.

    Also, there is nothing special about Table and column names. They are Database-level meta-data just like User names, Schema names, Index names, etc. All of this meta-data is controlled by the Database's default Collation.

    Meta-data (both Instance-level and Database-level) is only "normally" case-insensitive due to the default Collation suggested during installation being a case-insensitive Collation.

    a 'delimited identifier' is a column name, table name, or something similar enclosed in double quotes, such as CREATE TABLE "table"(...)

    It is more accurate to say that a delimited identifier is an identifier enclosed in whatever character(s) the DBMS in question has defined as its delimiters. And which particular characters are used for delimiters varies between the different DBMSs.

    In SQL Server, delimited identifiers are enclosed in square brackets: [GIVEN]

    While square brackets always work as delimiters for identifiers, it is possible to use double-quotes as delimiters IF you have the session-level property of QUOTED_IDENTIFIER set to ON (which is best to always do anyway).

    There are arcane parts to SQL (and delimited identifier handling is one of them)

    Well, delimited identifiers are actually quite simple. The whole point of delimiting an identifier is to effectively ignore the rules of regular (i.e. non-delimited) identifiers. But, in terms of regular identifiers, yes, those rules are rather arcane (mainly due to the official documentation being incomplete and incorrect). So, in order to take the mystery out of how identifiers in SQL Server actually work, I did a bunch of research and published the results here (which includes links to the research itself):

    Completely Complete List of Rules for T-SQL Identifiers

For more info on Collations / Encodings / Unicode / ASCII, especially as they relate to Microsoft SQL Server, please visit:

Collations.Info

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
  • You're addressing SQL Server; my comment says "Standard SQL" (ISO/IEC 9075:201x, etc). These are two different animals (very different animals — Standard SQL does not recognize `[]` as quotes for delimited identifiers, amongst other crucial differences, and Standard SQL _does_ require case-sensitivity in delimited identifiers). – Jonathan Leffler Jun 28 '18 at 14:42
  • 1
    @JonathanLeffler You might be correct re: "Standard SQL", but that is irrelevant to this question because: **a)** this question is specific to Microsoft SQL Server, and **b)** you incorrectly apply those "Standard SQL" rules to SQL Server where they clearly do not apply. In each of your comments on the question you state, or imply, that SQL Server operates according to these "Standard SQL" rules, and focus on whether or not the column names were delimited when the table was created, etc. But none of that is applicable here, and is thus misleading in this context. That is what I am clarifying. – Solomon Rutzky Jun 28 '18 at 15:20
1

The fact the column names are case sensitive means that the MASTER database has been created using a case sensitive collation.

In the case I just had that lead me to investigate this, someone entered

Latin1_CS_AI instead of Latin1_CI_AS

When setting up SQL server.

Roger Willcocks
  • 1,649
  • 13
  • 27
  • 2
    Hi Roger. No, the Collation of the `master` Database has nothing to do with how column names (i.e. Database-level meta-data) are handled outside of the system DBs (`master`, `model`, `msdb`, and `tempdb`). Now, the `model` DB is used as the template for creating new DBs, but the Collation of non-system DBs can be changed easily. It can be set differently by doing `CREATE DATABASE {db_name} COLLATE {collation_name};`, or it can be changed later via `ALTER DATABASE {db_name} COLLATE {new_collation_name};`. Please see my [answer](https://stackoverflow.com/a/44086215/577765) for more details. – Solomon Rutzky Jun 28 '18 at 14:09
0

Check the collation of the columns in your table definition, and the collation of the tempdb database (i.e. the server collation). They may differ from your database collation.

Anthony Faull
  • 17,549
  • 5
  • 55
  • 73
  • would you by any chance have queries providing a compact way of doing this ? (trying to be lazy as any good developer :) ) – Skippy Fastol May 03 '12 at 14:27
  • 1
    @SkippyFastol and Anthony: while it is true that there might be differences in those 3 Collations, that has nothing to do with the issue that the O.P. is having. The issue isn't with the data _in_ the column, but with the column names themselves, which are Database-level meta-data. This is controlled by the Database's default Collation, but even that is irrelevant when using `SqlBulkCopy`. Please see my [answer](https://stackoverflow.com/a/44086215/577765) for details. – Solomon Rutzky Jun 28 '18 at 14:05