0

From browsing the web - I understand that DELETE & UPDATE Clauses are not supported on Impala/Hive. I'm trying to find a workaround for this case. I tried to perform it with a INSERT OVERWRITE clause with no success :\

I have a partitioned table that contains: user_id, day, month, year (Partition on day,month,year).

Say i have 1 row for each date (each date is represented by those 3 partition columns) and i want to delete the row of 2016-05-01

If i used MySQL i would write:

DELETE FROM tblname WHERE year = 2016 and month = 5 and day = 1

How do i perform it on Hive/Impala?

Thank you !

shayms8
  • 671
  • 6
  • 13
  • 28

1 Answers1

0

Partition your data such that the rows (use window function row_number) you want to delete are in a partition . You can then drop the partition without impacting the rest of your table. This is a fairly sustainable model, even if your dataset grows quite large.

detail about Partition .

www.tutorialspoint.com/hive/hive_partitioning.htm

sandeep rawat
  • 4,797
  • 1
  • 18
  • 36
  • Thanks! It worked ! I used the `drop partition` that you mentioned: ` ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...; ` * I didn't used the `row_number` feature cuz my table is already partitioned like i said. – shayms8 Jul 01 '16 at 15:45
  • please http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – sandeep rawat Jul 06 '16 at 13:06