I have a hive orc table with around 2 million records , currently to update or delete I am loading entire table in to a dataframe and then update and save as new dataframe and saving this by Overwrite mode(below is command),so to update single record do I need to load and process entire table data??
I'm unable to do objHiveContext.sql("update myTable set columnName='' ") I'm using Spark 1.4.1, Hive 1.2.1
myData.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("myTable")
where myData is updated dataframe.
How can I get rid of loading entire 2-3 million records just to update a single record of hive table .