2

I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm looking for a possibility to create "custom keys". Per default, Flume is using the Timestamp as key within a SequenceFile. However, in my usecase I would like to use a customized string as key (instead of the timestamp).

What are best practices for implementing/configuring such a "custom key" within Flume?

Best, Thomas

Thomas Beer
  • 230
  • 3
  • 9
  • 1
    I've found the solution (special thanks to gherreros): 1) Implement a custom serializer, e.g. MyHDFSSequenceFileSerializer (you have to implement the interface "SequenceFileSerializer "). Flume serializers offer a great opportunity (among other things) for customizing the key of Flume events before writing them into a SequenceFile. 2) Configure the Flume agent to use your "custom" serializer using the "hdfs.writeFormat" option. Here you have to provide the full qualified class name of your serializer (or, to be more precise of the Builder which must be used to create the Serializer). – Thomas Beer Sep 10 '15 at 13:48

0 Answers0