1

I have a following problem. I have a .tsv file (for movies, years and genres). I want to upload it to SQL Server 2012.

I created a table

CREATE TABLE Genres2
(
MovieName varchar(255) NOT NULL,
Year int NOT NULL,
Genre varchar(255) NOT NULL
PRIMARY KEY (MovieName, Year)
);

While uploading it

BULK INSERT Genres3
FROM 'c:\Users\genres6.tsv'
WITH
(
    FIELDTERMINATOR='\t',
    ROWTERMINATOR='\n'
);

I get an error

The duplicate key value is

But notepad++ says, that there is only one such value in the file.

Any ideas how to solve it? Thank you in advance.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sparrow_ua
  • 664
  • 1
  • 11
  • 28
  • You defined `(MovieName, Year)` as primary key, are you sure you searched for this combination and not `MovieName` or `Year`? What should be unique? – Yosi Dahari Nov 28 '13 at 21:52
  • it is best to have an INT column as your Primary key, Have you considered using an Identity Column for a Primary Key ?? – M.Ali Nov 28 '13 at 21:56
  • just out of curiosity in your .tsv file do you have column names in your very 1st row ??? – M.Ali Nov 28 '13 at 21:57
  • 1
    @MuhammedAli - "it is best to have an INT column as your Primary key" is not a valid assumption. you have many considerations for using `bigint` or `uniqueidentifier` – Yosi Dahari Nov 28 '13 at 22:12
  • @Yosi why would you choose any of these two types you have mentioned ??? – M.Ali Nov 28 '13 at 22:14
  • I don't have Column names in the very first row – Sparrow_ua Nov 28 '13 at 22:14
  • I assigned int to the Year column. The could be duplicates with it, so I can not use it as Primary Key. Combination of MovieName and Year is unique – Sparrow_ua Nov 28 '13 at 22:18
  • `uniqueidentifier` can be generated without having to go through the database to assure uniqueness (see [this post](http://stackoverflow.com/questions/371762/what-exactly-is-guid-why-and-where-i-should-use-it) for more details), and `bigint` is more scalable (technically - you can produce more values.. ) – Yosi Dahari Nov 28 '13 at 22:18
  • @Yosi you do realise that GUID is a 16-byte and Bigint is 8 bytes in size, where as INT is only 4 byte. The bigger your Primary Key Column is in size less performance you get. – M.Ali Nov 28 '13 at 22:18
  • "bigint is more scalable" with INT you can add more then 2 billion records in a table and forget about billions if number of rows in your tables start getting into millions I think you should consider Partitioning NOT BIGINT :) – M.Ali Nov 28 '13 at 22:21
  • I'm not saying what to use here, I said this isn't a valid saying, that's all. – Yosi Dahari Nov 28 '13 at 22:22
  • 1
    @Yosi I believe it is absolutely valid to say so, as whenever possible Have you primary key as INT will give you better performance compare to any other data type.] – M.Ali Nov 28 '13 at 22:23
  • What you believe isn't what everyone else does :) I can see your point about bigint (as said here - http://stackoverflow.com/questions/2124631/sql-server-int-or-bigint-database-table-ids) but there it's totally reasonable to choose Guid over int in many cases, as said here - http://stackoverflow.com/questions/2804910/sql-guid-vs-integer – Yosi Dahari Nov 28 '13 at 22:26
  • @Yosi, @Muhammed Ali Having an autoincrement or unique column in your primary key will not give you any performance advantage if it is not used for joins or bears no meaning. Eg. having a clustered index just on `ID int identity` column in a fact table makes it only worse. – Stoleg Nov 28 '13 at 22:32

1 Answers1

2

Simple troubleshooting here:

  1. Disable or drop primary key constrain and load data.

  2. Run a query to find duplicate entries in proposed key column.

  3. If there no duplicates, then add constrain to copy to a new table with such constrain.

Stoleg
  • 8,972
  • 1
  • 21
  • 28