0

In the PubMed Data Source, I need to push the Output into a Kafka queue..Each source could be viewed as a Kafka Topic. (I know the concepts in Kafka and explored Kafka using Python)

I am able to view the PubMed Data(s) through FireFTP.

Can anyone help how to proceed forward?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Soundarya Thiagarajan
  • 574
  • 2
  • 13
  • 31

1 Answers1

0

You will want to use a service that downloads the data from FTP and spools it to Kafka. Apache Flume does exactly that. It' s quite easy to configure. You can either use a customer source for FTP https://github.com/keedio/flume-ftp-source or use a cron job that downloads the files to a spool dir and have flume pick up the files from there. Flume has a very decent Kafka Sink that allows writing continuously to kafka.

Erik Schmiegelow
  • 2,739
  • 1
  • 18
  • 22