5

I have a flag column in Hive table that I want to update after some processing. I have tried using hive and impala using the below query but it didn't work, and got that it needs to be a kudu table while the table I have is a non-kudu table. Is there a way to update it like this query below ?

UPDATE table_name SET flag_col = 1
where [condition];
leftjoin
  • 36,950
  • 8
  • 57
  • 116
Omar AlSaghier
  • 340
  • 4
  • 12

2 Answers2

3

Overwrite the whole table with calculated values, all other columns as is:

insert overwrite table table_name 
select col1, col2, ..., coln, 
       case when [condition] then 1 else flag_col end as flag_col,
       colO, 
       colP...
  from table_name ;

Read documentation for more details on partitioned tables, etc: https://docs.cloudera.com/documentation/enterprise/5-8-x/topics/impala_insert.html

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks for your answer. But as you know, can I use INSERT OVERWRITE and overwriting only specific values based on a condition? For example I want to update the flag for only a specific date? Also please note that I have 3 conditions to match if that helps – Omar AlSaghier Jan 12 '21 at 09:33
  • @OmarAlSaghier What difference does in make? Select all columns as is except those you need to update. For all columns you need to update write conditional expressions, with all conditions, does not matter how many conditions. And keep order of columns the same as in target table – leftjoin Jan 12 '21 at 09:54
  • I got your point .. but I am facing my issues with the [condition], because it is matching values from another table like: Select * CASE WHEN table1.col1 = table2.col1 and table1.col2 = table2.col2 THEN 1 ELSE falg END AS flag from table1, table2.. But like this, I am getting billions of duplicates. While when I tried it with constant values it worked. Do you have any ideas please? – Omar AlSaghier Jan 12 '21 at 12:39
  • @OmarAlSaghier Join tables correctly:` from table1 JOIN table2 on ` Only you know which are possible join keys which will not duplicate.Without join, if you write from table1, table2 it is cross (cartesian) product – leftjoin Jan 12 '21 at 12:48
  • @OmarAlSaghier Better prepare some data example from both tables, it is not clear. If you want to check if data exists in table2, use LEFT JOIN on and in CASE check if some key column from t2 is not null (joined) – leftjoin Jan 12 '21 at 13:37
  • The whole issue is that I have a main table **main_table** which has a _flagCol_ and has all on the information.. and there is another table **temp_table** that is truncated and overwritten with a certrain number of information to do some process on them. After that, I want to update the flag in the **main_table** for the data proessed in the **temp_table** based on this query: 'Select col_4, CASE WHEN table1.col1 = table2.col1 and table1.col2 = table2.col2 THEN 1 ELSE falg END AS flag, col_3 from table1, table2' So I am matching the two tables based on multiple conditions – Omar AlSaghier Jan 12 '21 at 14:12
  • @OmarAlSaghier from table1, table2 - this is not correct. It should be: from table1 left join table2 ON . And check that join does not duplicate. If it is duplicate - then join condition is wrong. Maybe it is `table1.col1 = table2.col1 and table1.col2 = table2.col2` ?? then in the CASE you need to check if table 2 has joined like this CASE when table1.col1 is not null then 1 else.... end as flag. Just describe your task with all the details+ data example from both – leftjoin Jan 12 '21 at 14:28
1

Hive doesn't support updates (or deletes), but it supports INSERT INTO, so it is possible to add new rows to an existing table.

> insert overwrite table table_name 
> select *,
 case when [condition] then 1 else flag_col end as flag_col,
 from table_name 

//If you want to use you can add where// > where id <> 1;
  • 1
    Pefect .. Thanks for your help, but I think I need to rewrite all names of the columns, because * and then the CASE will add a new column I think – Omar AlSaghier Jan 12 '21 at 08:37