3

Let's consider that I have a Table A that I use to generate a Table B. I have no primary key or unique key I could use for a ON DUPLICATE KEY.

My Table B has 3 columns: A, B, and C.

I want to INSERT this only if the row already doesn't exist. I can determine if the row exist be the combination of A and B. I use a WHERE NOT EXISTS statement to do so.

INSERT INTO TableB
(A, B, C)
SELECT 
SUM(A),
SUM(B),
SUM(C)
FROM TableA
WHERE NOT EXISTS (SELECT * FROM TABLE B WHERE A=A AND B=B)
GROUP BY A, B

If Table B is empty, the row are inserted. However, as soon as Table B has a record, no rows are inserted even if they doesn't already exist in Table B.

Wistar
  • 3,770
  • 4
  • 45
  • 70
  • Why not use `UNIQUE(A, B, C)`? – Fozi Nov 06 '13 at 21:42
  • 1
    @Fozi Because A could be repeated. B can also so be repeated but the it's only the combination of the two that is unique. It's the nature of the data I work with. – Wistar Nov 06 '13 at 21:50
  • Ant that's exactly what `UNIQUE(A, B, C)` would do for you; It's not the same as `UNIQUE(A) UNIQUE(B) UNIQUE(C)`. – Fozi Nov 06 '13 at 21:54
  • Sorry, I misread the question: It should be `UNIQUE(A, B)` – Fozi Nov 06 '13 at 21:56
  • @Fozi I didn't knew that. You made a good point here. Thanks. – Wistar Nov 07 '13 at 01:45
  • @Fozi If you propose something in that way I might prefer your answer for performance perspective since the answer that I accepted before greatly slow my queries – Wistar Nov 07 '13 at 02:01

2 Answers2

5

You should prefix the columns from the outer query with its alias in the subquery:

INSERT INTO TableB (A, B, C)
  SELECT 
      SUM(A),
      SUM(B),
      SUM(C)
    FROM TableA ta
  WHERE NOT EXISTS (SELECT * FROM TableB tb WHERE tb.A=ta.A AND tb.B=ta.B)
  GROUP BY A, B

The way you wrote it, you were comparing values from table TableB with values from TableB (each row with itself), so when you inserted at least one row, the condition that "there is no row that is equal to itself" was never true.

Przemyslaw Kruglej
  • 8,003
  • 2
  • 26
  • 41
  • However this greatly increased the execution time of my query (from about 0.1s to >10s) – Wistar Nov 07 '13 at 01:53
  • 1
    @Wistar Of course it did! In your version, when there was at least one row in the `TableB`, the `NOT EXISTS` condition was never satisfied and it took only one record from `TableB` to make that condition fail. When you changed it to work properly (check values from table from outer query) it has to check all rows and see if neither of them has the same values - it must be slower (would be good to have a compound index on `TableB.A` and `TableB.B` columns). – Przemyslaw Kruglej Nov 07 '13 at 02:01
  • As @Fozi suggested, maybe using a `UNIQUE(A,B)` constraint to Table B and then using `ON DUPLICATE KEY UPDATE` might be more performance wise. – Wistar Nov 07 '13 at 02:13
1

I think that the best and faster way to do this is to apply a UNIQUE Constraint on A + B.

Alter TABLE TableB ADD CONSTRAINT unique_A_B UNIQUE(A,B);

Then use ON DUPLICATE KEY UPDATE:

INSERT INTO TableB (A,B,C)
SELECT 
      SUM(A),
      SUM(B),
      SUM(C)
    FROM TableA 
GROUP BY A, B
ON DUPLICATE KEY UPDATE C=SUM(C);

For the example purpose I updated C but I guess that in a real situation you'd want to update the Update Timestamp of your row with Now().

Wistar
  • 3,770
  • 4
  • 45
  • 70