0

I am trying to use Spark Streaming and Kafka to ingest and process messages received from a web server.

I am testing the consumer mentioned in https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md to take advantage of the extra features it offers.

As a first step, I am trying to use the example provided just to see how it plays out. However, I am having difficulties actually seeing the data in the payload.

Looking at the result of the following function:

ReceiverLauncher.launch

I can see it returns a collection of RDDs, each of type:

MessageAndMetadata[Array[Byte]]

I am stuck at this point and don't know how to parse this and see the actual data. All the examples on the web that use the consumer that ships with Spark create an iterator object, go through it, and process the data. However, the returned object from this custom consumer doesn't give me any iterator interfaces to start with.

There is a getPayload() method in the RDD, but I don't know how to get to the data from it.

The questions I have are:

  1. Is this consumer actually a good choice for a production environment? From the looks of it, the features it offers and the abstraction it provides seem very promising.

  2. Has anybody ever tried it? Does anybody know how to get to the data?

Thanks in advance,

Moe

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Moe
  • 33
  • 5
  • It seems I finally got it working. I converted the result of of the getPayload() function to a String and now I can print the actual data. Easier than I thought : ) – Moe Jun 26 '17 at 17:13
  • Instead of looking into an alternative received-base implementation, you should look into the DirectKafkaConsumer: https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html – maasg Jun 26 '17 at 20:04

1 Answers1

-1

getPayload() needs to be converted to String, e.g.

new String(line.getPayload())
Luke
  • 2,562
  • 1
  • 18
  • 35
Jiwei
  • 1
  • 2