5

I am having around 2000000 messages in Kafka topic and I want to put these records into HDFS using NiFi,so I am using PutHDFS processor for this along with ConsumeKafka_0_10 but it generates small files in HDFS, So I am using Merge Content processor for the merging the records before pushing the file. enter image description here Please help if the configuration needs changes This works fine for small number of messages but writes a single file for every record when it comes to topics with massive data.

Thank you!!

Sivaprasanna Sethuraman
  • 4,014
  • 5
  • 31
  • 60
BARATH
  • 364
  • 2
  • 17

1 Answers1

5

The Minimum Number of Entries is set to 1 which means it could have anywhere from 1 to the Max Number of Entries. Try making that something higher like 100k.

Bryan Bende
  • 18,320
  • 1
  • 28
  • 39
  • But what if there is literally only 1 entry left in `queue`, in that case, will that entry pile up until it receives more records? – Vishrant Jul 18 '18 at 14:10
  • 3
    You would typically set a Max Bin Age so that you'd wait for a certain amount of time, but if theres only 1 entry and the time is reached then it sends out just that one. – Bryan Bende Jul 18 '18 at 14:12
  • Got it. Thanks. – Vishrant Jul 18 '18 at 14:13
  • @Bryan Bende, when I set the Minimum and Maximum entries to 100000 or more the application crashes and stops loading the data after some time. Is this behavior expected? while using merge content Processor? – BARATH Jul 25 '18 at 16:20
  • What is the error when it crashes? If it is OutOfMemory exception then you may need to increase your heap size in order to maintain this many flow file pointers in memory – Bryan Bende Jul 25 '18 at 16:42
  • Yeah it is out of memory exception and after that the nifi portal has gone down and nothing is working I am not able to stop or start any processors.. – BARATH Jul 25 '18 at 17:45
  • what is your min/max heap set to in bootstrap.conf? – Bryan Bende Jul 25 '18 at 18:30
  • not able to push the data in merge content in Nifi , the data is in queue and doesn't enter the merge content, any specific reasons ? – Aditya Verma Jun 03 '20 at 10:16