0

I'm trying to import words from a CSV file. some of its records are already in the table. My Query is as below:

COPY words(word,definition,category) FROM '/home/username/Downloads/New1Names.csv' DELIMITER ',' CSV;  

Word column is unique. I receive an error in case of duplicate as below:

ERROR:  duplicate key value violates unique constraint "words_word_u"
DETAIL:  Key (word)=(johnson) already exists.
CONTEXT:  COPY words, line 1: "word definition is here,male"
Alireza
  • 6,497
  • 13
  • 59
  • 132
  • What you want is a bulk upsert, or `MERGE`. There's nothing conveniently built-in to PostgreSQL for this at present, so you need to load into a temp table then merge the data with SQL. – Craig Ringer Jan 04 '14 at 09:51

1 Answers1

2

UPDATED: You can do it this way

-- begin a transaction
BEGIN;
-- create a temporary table based on a factual table `words`
CREATE TEMP TABLE words_temp AS
SELECT word, definition, category 
  FROM words 
  WITH NO DATA
-- import data from the file into the temporary table
COPY words(word,definition,category) 
FROM '/home/username/Downloads/New1Names.csv' DELIMITER ',' CSV;  
-- prevent other concurrent writer processes to make changes while allowing to select from it
LOCK TABLE words IN EXCLUSIVE MODE;
-- insert from temporary into factual table only words that don't yet exist
INSERT INTO words(word,definition,category)
SELECT word,definition,category 
  FROM words_temp t 
 WHERE NOT EXISTS
(
  SELECT * 
    FROM words 
   WHERE word = t.word
);
-- commit the transaction and release the lock
COMMIT;
peterm
  • 91,357
  • 15
  • 148
  • 157
  • Huh. I never noticed the `WITH [NO] DATA` clause. Your answer as it stands is wrong in the presence of any kind of concurrent write access, though; it should begin with a transaction (`BEGIN;`) and before the `INSERT INTO` it should `LOCK TABLE words IN EXCLUSIVE MODE;` to ensure no other writers can cause issues. You don't need to drop the temp table really, either. Some explanation of what this is doing might be helpful for a user who clearly doesn't really know where to start, but the approach is reasonable. – Craig Ringer Jan 04 '14 at 09:50
  • @CraigRinger Thanks for your input. Fair enough. Updated the answer. But truth be told the answer had been written in assumption that it's rather one time thing or an operation that is done by a dba in a controlled environment, because OP's question doesn't give any clue to think otherwise. Therefore I find your *Your answer as it stands is wrong* a little bit strong. :) Anyway thanks again. It might help others though. – peterm Jan 06 '14 at 02:47
  • Fair call. "Doesn't consider some other issues" is more like it. – Craig Ringer Jan 06 '14 at 02:53