0

I have a hive orc table with around 2 million records , currently to update or delete I am loading entire table in to a dataframe and then update and save as new dataframe and saving this by Overwrite mode(below is command),so to update single record do I need to load and process entire table data??

I'm unable to do objHiveContext.sql("update myTable set columnName='' ") I'm using Spark 1.4.1, Hive 1.2.1

myData.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("myTable") where myData is updated dataframe.

How can I get rid of loading entire 2-3 million records just to update a single record of hive table .

James Z
  • 12,209
  • 10
  • 24
  • 44
sudhir
  • 1,387
  • 3
  • 25
  • 43
  • 1
    AFAIK you cannot. Spark is designed for large scale analytics not to perform fine grade changes on external data sources. – zero323 Jan 06 '16 at 19:24
  • Thank you!, how can we dispose(delete) a dataframe after its use? I'm getting outOfMemoryException, how can I increase heap size? – sudhir Jan 07 '16 at 02:43

0 Answers0