How to write data into a Hive table?

Question

I use Spark 2.0.2.

While learning the concept of writing a dataset to a Hive table, I understood that we do it in two ways:

using sparkSession.sql("your sql query")
dataframe.write.mode(SaveMode."type of mode").insertInto("tableName")

Could anyone tell me what is the preferred way of loading a Hive table using Spark ?

depends on developers. I generally use 1st one. – Sandeep Singh Aug 09 '17 at 07:14 — Sandeep Singh, Aug 09 '17 at 07:14

score 0 · Answer 1 · answered Aug 10 '17 at 07:13

In general I prefer 2. First because for multiple rows you cannot build such a long sql and second because it reduces the chance of errors or other issues like SQL injection attacks.

In the same way that for JDBC I use PreparedStatements as much as possible.

score 0 · Answer 2 · answered Aug 13 '17 at 06:01

Think in this fashion, we need to achieve updates on daily basis on hive.

This can be achieved in two ways

Process all the data of the hive
Process only effected partitions.

For the first option sql works like a gem, but keep in mind that the data should be less to process entire data.

Second option works well.If you want to process only effected partition. Use data.overwite.partitionby.path You should write the logic in such a way that it process only effected partitions. This logic will be applied to tables where data is in millions T billions records

How to write data into a Hive table?

2 Answers2