Environment:
Spark version: 2.3.0
Run Mode: Local
Java version: Java 8
The spark application trys to do the following
1) Convert input data into a Dataset[GenericRecord]
2) Group by the key propery of the GenericRecord
3) Using mapGroups after…
My pipeline is the following:
Source-webservices ---> Kafka Producer --> topics --> sparkJobs --> hdfs/hive
I have two design-related questions:
I need to pull the data from DataSourceAPIs(web service URLs) and push on to the topics.
If I use…