I am trying to use Spark Streaming and Kafka to ingest and process messages received from a web server.
I am testing the consumer mentioned in https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md to take advantage of the extra features it offers.
As a first step, I am trying to use the example provided just to see how it plays out. However, I am having difficulties actually seeing the data in the payload.
Looking at the result of the following function:
ReceiverLauncher.launch
I can see it returns a collection of RDDs, each of type:
MessageAndMetadata[Array[Byte]]
I am stuck at this point and don't know how to parse this and see the actual data. All the examples on the web that use the consumer that ships with Spark create an iterator object, go through it, and process the data. However, the returned object from this custom consumer doesn't give me any iterator interfaces to start with.
There is a getPayload()
method in the RDD, but I don't know how to get to the data from it.
The questions I have are:
Is this consumer actually a good choice for a production environment? From the looks of it, the features it offers and the abstraction it provides seem very promising.
Has anybody ever tried it? Does anybody know how to get to the data?
Thanks in advance,
Moe