0

Am sending the CSV data to Kafka topic using Kafka-Python. Data is sent and received by Consumer successfully. Now am trying to stream a csv file continuously, any new entry added to the file should be automatically sent to Kafka topic. Any suggestion would be helpful on continuous streaming of CSV file

Below is my existing code,

   from kafka import KafkaProducer
   import logging
   from json import dumps, loads
   import csv
   logging.basicConfig(level=logging.INFO)


   producer = KafkaProducer(bootstrap_servers='127.0.0.1:9092', value_serializer=lambda 
   K:dumps(K).encode('utf-8'))

   with open('C:/Hadoop/Data/Job.csv', 'r') as file:
   reader = csv.reader(file, delimiter = '\t')
       for messages in reader:
       producer.send('Jim_Topic', messages)
       producer.flush()
Jim Macaulay
  • 4,709
  • 4
  • 28
  • 53
  • Does it have to be Python? For ingest/egress Kafka Connect is generally a much better approach. If that would be useful I can provide an answer based on it – Robin Moffatt Jun 17 '20 at 09:23
  • @RobinMoffatt, yes please provide me the answer using Kafka Connect, will utilize it – Jim Macaulay Jun 17 '20 at 10:41

1 Answers1

1

Kafka Connect (part of Apache Kafka) is a good way to do ingest and egress between Kafka and other systems, including flat files.

You can use the Kafka Connect SpoolDir connector to stream CSV files into Kafka. Install it from Confluent Hub, and then provide it with configuration for your source file:

curl -i -X PUT -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/source-csv-spooldir-00/config \
    -d '{
        "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector",
        "topic": "orders_spooldir_00",
        "input.path": "/data/unprocessed",
        "finished.path": "/data/processed",
        "error.path": "/data/error",
        "input.file.pattern": ".*\\.csv",
        "schema.generation.enabled":"true",
        "csv.first.row.as.header":"true"
        }'

See this blog for more examples and details.

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92