0

i have a problem with apache nifi i want to move data from database to hdfs .i have one table year with one column when i move it i found a lot of files contain the same table year . what i have to do to remove the duplicated files i have used the updateattribute processor but i didnt know how to use it to fix the problem enter image description here

this pic show the duplicated files with the same content in hdfs directory

ikram
  • 1
  • 1
  • These don't look like duplicated files, you just have a small file problem. Use MergeRecord / MergeContent before PutHDFS to bucket your data in to bigger files around your HDFS block size. You could also avoid this by using larger batch sizes when consuing from the DB and not having tiny flowfiles. Would need to see your flow to suggest more. – Sdairs Jul 24 '21 at 14:51
  • i have a table year in my database when i check it in hdfs after import it with apache nifi i find a lot of files with the same table year this is the prblm exactlly .could you tell me how to configure mergerecord and mergecontent to stop the problem.thanks – ikram Jul 28 '21 at 22:34

0 Answers0