2

We have a spark streaming application that consumes Gnip compliance stream.

In the old version of the API, the compliance stream was provided by one end point but now it is provided by 8 different endpoints.

We could run the same spark application 8 times with different parameters to consume different endpoints.

Is there a way in spark streaming to consume the 8 endpoints and merge them into one in the same application?

Should we use different streaming context for each connection or one context is enough?

Fanooos
  • 2,718
  • 5
  • 31
  • 55

1 Answers1

1

I think you are looking for Spark union here.

Read following for examples Concatenating datasets of different RDDs in Apache spark using scala

As per Spark documentation Spark union :

Return a new dataset that contains the union of the elements in the source dataset and the argument.

Community
  • 1
  • 1
Amit Kumar
  • 2,685
  • 2
  • 37
  • 72