1

I have a use case where I have to "join" multiple Kafka topics based on some criteria in StreamSets Data Collector. I wonder if there is some commonly adopted idiom that could solve such a problem?

Gill Bates
  • 14,330
  • 23
  • 70
  • 138

2 Answers2

1

StreamSets Data Collector is really not the right tool for this sort of job, since a Data Collector pipeline can have only one origin.

You should look at StreamSets Transformer, which is built on Spark specifically to be able to join multiple streams of data and perform similar tasks.

metadaddy
  • 4,234
  • 1
  • 22
  • 46
0

How about using the Kafka Multitopic Consumer 1 then the Stream Selector Processor 2 to route to trash or processing based on you criteria?

eze
  • 2,332
  • 3
  • 19
  • 30