5

I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell)

objHiveContext.sql("select * from table_name ") able to see data but when I run

objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e TBLPROPERTIES "orc.compress"="NONE" transactional true etc)

Tried with Insert into using case statements and all but couldn't Can we UPDATE hive ORC tables from spark? If yes then what is the procedure ?

Imported below

import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._

Note: I didn't apply any partition or bucketing on that table If I apply bucketing I'm even unable to view data when stored as ORC Hive Version:1.2.1 Spark version:1.4.1 Scala Version :2.10.6

sudhir
  • 1,387
  • 3
  • 25
  • 43

2 Answers2

1

Have you tried the DataFrame.write API using SaveMode.Append per the link below?

http://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options

use "orc" as the format and "append" as the save mode. examples are in that link above.

Chris Fregly
  • 1,490
  • 1
  • 12
  • 8
  • It Worked! So directly we cannot update hive table. I mean if I have a table of arount one lakh records then I need to load entire table and then update and store in a new dataframe and then overwrite that table (SaveMode.Overwrite) as append will append to existing instead modifying.., Is there any alternative? One thing I saved a table in ORC format it is saving in default database , How to mention DataBase Name while saving ?? I used this command to save myData.write.format("orc").mode(SaveMode.Append).saveAsTable("yahoo_orc_table") – sudhir Dec 31 '15 at 15:29
1

Answer to sudhir question:-

How to mention DataBase Name while saving?

you can provide the database name before the table name. ex:- if your database name is orc_db and table name is yahoo_orc_table then you can mention the db name before the table name as below:-myData.write.format("orc").mode(SaveMode.Append).saveAsTable("orc_db.yahoo_orc_table")

Dyuti
  • 184
  • 4