8

Using Apache Spark 2.2: Structured Streaming, I am creating a program which reads data from Kafka and write it to Hive. I am looking for writing bulk data incoming in Kafka topic @ 100 records/sec.

Hive Table Created:

CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd STRING, booleanee BOOLEAN ) STORED AS ORC ;

Insert via Manual Hive Query:

INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);

Insert via spark structured streaming code:

SparkConf conf = new SparkConf();
conf.setAppName("testing");
conf.setMaster("local[2]");
conf.set("hive.metastore.uris", "thrift://localhost:9083");
SparkSession session = 
SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();

// workaround START: code to insert static data into hive
String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true)";
session.sql(insertQuery);
// workaround END:

// Solution START
Dataset<Row> dataset = readFromKafka(sparkSession); // private method reading data from Kafka's 'xyz' topic

// **My question here:**
// some code which writes dataset into hive table demo_user
// Solution END
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420

1 Answers1

-1

you do not need to create the hive table when using the following,this gets created automatically

dataset.write.jdbc(String url, String table, java.util.Properties connectionProperties)

or use

dataset.write.saveAsTable(String tableName)