The text_with_headers serializer (HDFS sink serializer) allows to save the Flume event headers rather than discarding them. The output format consists of the headers, followed by a space, then the body payload. We would like to drop the body and retain the headers only. For the HBase sink, the "RegexHbaseEventSerializer" allows us to transform the events. But I am unable to find such a provision for the HDFS sink.
Asked
Active
Viewed 1,685 times
2
-
1HDFS sink actually expects a body, because that's what it will write to a file in your DFS. the headers are used for paths and such. If you actually just want to write headers to HDFS as opposed to the the original body, write an interceptor that converts the headers and writes them to the event body in the desired format – Erik Schmiegelow Jan 08 '15 at 13:37
-
Thanks Erik. Upon further investigation, I realized that regex_filter interceptor might just be the one for the task. All I intend to do is modify the body (event). Your thoughts please. – Salman Ahmed Jan 09 '15 at 14:37
-
If you want to modify the body, you will have to write your own interceptor. It's quite easy actually, if you want I can provide a small sample. – Erik Schmiegelow Jan 13 '15 at 10:52
1 Answers
1
You can set serializer property to header_and_text, which outputs both the headers and the body.
For example:
agent.sinks.my-hdfs-sink.type = hdfs
agent.sinks.my-hdfs-sink.hdfs.fileType = DataStream
...
# very important
agent.sinks.my-hdfs-sink.serializer = header_and_text

Green Lei
- 3,150
- 2
- 20
- 25