5

Let's say I have 100 flow files produced by one processor, each of them contains a different line. I want to get a new flow file which contains 100 line. How can I did that?

I have tried MergeContent processor, but it gives me the origin 100 flow files back.

Current config:

enter image description here

Update:

I debugged the output of MergeContent, in the first step JOIN, it seems ok since the data is 576.34 KB which contains 100 line. But the second step ATTRIBUTES_MODIFIED it seems only output 1 line to the final result.

enter image description here

Update:

This is my whole procedure.

  1. Get from kafka one by one.
  2. Convert kafka message to one line string in one flow file.
  3. Merge multiple flow files into one.
  4. PutHDFS.

Now I'm stuck at step 3, I can not merge them one by one. I don't care the order or the attribute, I just need limit the number.

Update:

I have try to set correlation attribute to ${kafka.topic} since all the flow files from the same kafka topic, but they still can not merge:

enter image description here

xingbin
  • 27,410
  • 9
  • 53
  • 103
  • Are there something common in those files? Why you aren't using correlation attribute? – daggett May 26 '19 at 20:13
  • @daggett They don't have any common things. I just fetch them from different places and I need put them in one file. – xingbin May 26 '19 at 20:25
  • Just limited by number? – daggett May 26 '19 at 20:27
  • @daggett Yeah. Just limited by number. I have searched for two days but get no luck... – xingbin May 26 '19 at 20:29
  • @daggett This is my whole procedure. 1. Get from kafka one by one. 2. Convert kafka message to one line string. 3. Merge multiple flow files into one. 4. PutHDFS. Now I'm stuck at step 3, I can not merge them one by one. I don't care the order or the attribute, I just need limit the number. – xingbin May 26 '19 at 20:31
  • @daggett I have tried to use kafka topic as correlation attribute, but it does not merge. – xingbin May 26 '19 at 20:49
  • i tried generateFlowFile-MergeContent-LogAttr and all works as expected. could you try to stop merge processor (to have many files in queue) and then start it - will it merge? – daggett May 26 '19 at 22:15
  • Did you try passing the correlation attribute as ${kafka.topic} (without the quotes). Everything else looks fine to me. let me know if it works. Also, are all your flowfiles on the same node? or they are distributed in the cluster? – mythic May 29 '19 at 10:57

1 Answers1

3

Are you using the original or merged relationships from the MergeContent processor? The former will provide the same 100 flowfiles back to you in case you need to do additional processing; the latter will give you a single flowfile with the contents of all the merged flowfiles. It looks from your provenance listing that the merge event is happening successfully, so double check with relationships you are using. If possible, please post a screenshot of your flow.

Andy
  • 13,916
  • 1
  • 36
  • 78