0

am working on the file stream connector, I have more than ten million records in the file(it's not a single file, its partition by account #). I have to load these files into the topic and update my streams. have gone through stand-alone streams, I have the following question and need help to achieve.

  1. look at the data set, I have two account#, each account has 5 rows, I would need to group them in two rows and key as acctNbr.

how to write my source connector to read the file and get the grouping logic?

  1. my brokers are running in Linux machines X,Y,Z.. post-development of source connector, my jar file should it deploy in every broker(if I start running in the distributed broker )?

  2. I have only 30 mins window to extract file drop to the topic? what are all the parameters that are there to tune the logic to get my working window down? FYI, this topic would have more than 50 partitions and 3 broker set up.

Data set:

{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-01","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-02","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-03","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-04","currentPrice":"10","availQnty":"10"}
{"acctNbr":"1234567","secNbr":"AAPL","date":"2010-01-05","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-01","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-02","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-03","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-04","currentPrice":"10","availQnty":"10"}
{"acctNbr":"abc3355","secNbr":"AAPL","date":"2010-01-05","currentPrice":"10","availQnty":"10"}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

1 Answers1

0

how to write my source connector to read the file and get the grouping logic

FileSream connector cannot do this, and was not intended for such a purpose other than an example to write your own connectors. In other words, do not use in production.

That being said, you can use alternative solutions like Flume, Filebeat, Fluentd, NiFi, Streamsets, etc, etc, to glob your filepaths, then send all records line-by-line into a Kafka topic.

post-development of source connector, my jar file should it deploy in every broker

You should not run Connect on any broker. The Connect servers are called workers.

have only 30 mins window to extract file drop to the topic?

Not clear where this number came from. Any of the above methods listed above watch for all new files, without any defined window.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245