How can I update or delete records of hive table from spark , with out loading entire table in to dataframe?

Asked Jan 06 '16 at 13:39

Active Jan 07 '16 at 17:39

Viewed 1,791 times

I have a hive orc table with around 2 million records , currently to update or delete I am loading entire table in to a dataframe and then update and save as new dataframe and saving this by Overwrite mode(below is command),so to update single record do I need to load and process entire table data??

I'm unable to do objHiveContext.sql("update myTable set columnName='' ") I'm using Spark 1.4.1, Hive 1.2.1

myData.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("myTable") where myData is updated dataframe.

How can I get rid of loading entire 2-3 million records just to update a single record of hive table .

edited Jan 07 '16 at 17:39

James Z

12,209
10
24
44

asked Jan 06 '16 at 13:39

sudhir

1,387
3
25
43

1

AFAIK you cannot. Spark is designed for large scale analytics not to perform fine grade changes on external data sources. – zero323 Jan 06 '16 at 19:24
Thank you!, how can we dispose(delete) a dataframe after its use? I'm getting outOfMemoryException, how can I increase heap size? – sudhir Jan 07 '16 at 02:43

How can I update or delete records of hive table from spark , with out loading entire table in to dataframe?

0 Answers0