13

I have thousands of files generated each day which I want to stream using Kafka. When I try to read the file, each line is taken as a separate message.

I would like to know how can I make each file's content as a single message in Kafka topic and with consumer how to write each message from Kafka topic in a separate file.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Nahush
  • 411
  • 1
  • 6
  • 16
  • Did you have a look into Kafka Connect? http://docs.confluent.io/3.0.0/connect/index.html – Matthias J. Sax Aug 24 '16 at 00:29
  • Yes I am aware of it. How can I use it here? The scenario is when I read the file each line is taken as a separate message, but I want each file to be a long single message. (File may have 30-40 lines) – Nahush Aug 24 '16 at 06:21
  • Are you using Java client, console producer, other? – Luciano Afranllie Aug 24 '16 at 12:30
  • Yes my producer will be mostly in Java but I am also open to other options. – Nahush Aug 26 '16 at 07:05
  • Hi @Nahush Can you please send me the code you used to implement this scenario! I couldn't get any references How one should write a producer for this kind of scenario – AshrithGande Nov 07 '17 at 09:23

1 Answers1

10

You can write your own serializer/deserializer for handling files. For example :

Producer Props :

props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringSerializer);  
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, YOUR_FILE_SERIALIZER_URI);

Consumer Props :

props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, YOUR_FILE_DESERIALIZER_URI);

Serializer

public class FileMapSerializer implements Serializer<Map<?,?>> {

@Override
public void close() {

}

@Override
public void configure(Map configs, boolean isKey) {
}

@Override
public byte[] serialize(String topic, Map data) {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ObjectOutput out = null;
    byte[] bytes = null;
    try {
        out = new ObjectOutputStream(bos);
        out.writeObject(data);
        bytes = bos.toByteArray();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            if (out != null) {
                out.close();
            }
        } catch (IOException ex) {
            // ignore close exception
        }
        try {
            bos.close();
        } catch (IOException ex) {
            // ignore close exception
        }
    }
    return bytes;
}
}

Deserializer

public class MapDeserializer implements Deserializer<Map> {

@Override
public void close() {

}

@Override
public void configure(Map config, boolean isKey) {

}

@Override
public Map deserialize(String topic, byte[] message) {
    ByteArrayInputStream bis = new ByteArrayInputStream(message);
    ObjectInput in = null;
    try {
        in = new ObjectInputStream(bis);
        Object o = in.readObject();
        if (o instanceof Map) {
            return (Map) o;
        } else
            return new HashMap<String, String>();
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            bis.close();
        } catch (IOException ex) {
        }
        try {
            if (in != null) {
                in.close();
            }
        } catch (IOException ex) {
            // ignore close exception
        }
    }
    return new HashMap<String, String>();
}
}

Compose messages in the following form

final Object kafkaMessage = new ProducerRecord<String, Map>((String) <TOPIC>,Integer.toString(messageId++), messageMap);

messageMap will contain fileName as key and the file content as value. Value can be serializable object. Hence each message will contain a Map with File_Name versus FileContent map.Can be single value or multiple value.

jimijazz
  • 2,197
  • 1
  • 15
  • 24
Rambler
  • 4,994
  • 2
  • 20
  • 27