All,
I am working on consuming data from Kafka on dump into HDFS. I am able to consume data and wanted to get the total counts of records from Kafka and save as a file into HDFS so that i can use that file for the validation. I was able to print records in console but i am not sure how can i create the file of total count?
Query to pull records from Kafka:
Dataset ds1=ds.filter(args[5]);
StreamingQuery query = ds1
.coalesce(10)
.writeStream()
.format("parquet")
.option("path", path.toString())
.option("checkpointLocation", args[6] + "/checkpoints" + args[2])
.trigger(Trigger.Once())
.start();
try {
query.awaitTermination();
} catch (StreamingQueryException e) {
e.printStackTrace();
System.exit(1);
}
and the code that i have written to get the records and print in console:
Dataset stream=ds1.groupBy("<column_name>").count();
// Actually, I wanted to get the count without using GroupBy, i have tried long stream=ds1.count()
but i was encounter with the error.
StreamingQuery query1=stream.coalesce(1)
.writeStream()
.format("csv")
.option("path", path + "/record")
.start();
try {
query1.awaitTermination();
} catch (StreamingQueryException e) {
e.printStackTrace();
System.exit(1);
}
This is not working, can you please help me to solve this problem?