The ACID properties in Hive allow to delete rows from a table using the following syntax :
DELETE FROM table
WHERE id IN (SELECT id FROM raw_table)
But what's the best solution to delete rows when the primary_key is composed of several columns ?
I have tried the following with an EXISTS :
DELETE FROM table
WHERE EXISTS (SELECT id1, id2 FROM raw_table
WHERE raw_table.id1 = table.id1 AND raw_table.id2 = table.id2)
Or the following (concatenating all the columns, not sure if this is valid) :
DELETE FROM table
WHERE CONCAT(id1, id2) IN (SELECT CONCAT(id1, id2) FROM raw_table)
Do you have any advice on what is the best solution ?