0

I have a Nifi Flow, which fetches a data from RDS tables and load into S3 as flat files, now i need to generate another file which will be having the name of the file that I am loading into S3 bucket, this needs to be a separate flow;

example: if the RDS extracted flat file name is RDS.txt, then the new generated file should have rds.txt as content and I need to load this file to same S3 bucket.

Problem I face is I am using a generate flowfile processor and adding the flat file name as custom text in flowfile, but i could not set up any upstream for Generate flow file processor, so this is generating more files, if I use the merge content processor after the generate flow file processor, I could see duplicate values in the flowfile.

Can anyone help me out in this

  • I think you need to refrase your question, i am not sure i get it. Q1 - who managed the RDS filename ? Q2 - why to use genereate flowfile ? is your trigger – Up_One Dec 09 '20 at 09:20
  • Agreed - not clear why you are using generate flow file - perhaps provide a screenshot of your flow? Sounds like it should just be QueryDB -> UpdateAttribute -> PutS3? – Sdairs Dec 09 '20 at 11:44
  • @Up_One, I will clarify the request; – Vasanth Kumar Dec 09 '20 at 13:50
  • @Up_One, I will clarify the request, I have two requests 1. Extract data from RDS load it into S3 as a text file 2. Generate a text file with, this text file will only have the name of the text file which we loaded into S3 as a textfile. Flow 1 is working perfectly, now for the second file I am using a Generate flow file which will have the name of the text file followed by Merge content, Put S3 object; The problem now I am facing is as the generate flow file generates more number of flow files, I am getting more files so my s3 is flooded with files. – Vasanth Kumar Dec 09 '20 at 13:56

2 Answers2

0

I have a Nifi Flow, which fetches a data from RDS tables and load into S3 as flat files, now i need to generate another file which will be having the name of the file that I am loading into S3 bucket, this needs to be a separate flow;

Easiest path to do this is to chain something after PutS3Object that will update the flowfile contents with what you want. It would be really simple to write with ExecuteScript. Something like this:

def ff = session.get()
if (ff) {
  def updated = session.write(ff, {
    it.write(ff.getAttribute("filename").bytes)
  } as OutputStreamCallback)
  updated = session.putAttribute(updated, "is_updated", "true")
  session.transfer(updated, REL_SUCCESS)
}

Then you can put a RouteOnAttribute after PutS3Object and have it route to either a null route if it detects the attribute is_updated or route back to PutS3Object if it's not been updated.

Mike Thomsen
  • 36,828
  • 10
  • 60
  • 83
  • Thank you Mike, as per your answer will this generate a new flowfile with only the filename – Vasanth Kumar Dec 09 '20 at 13:21
  • If my understanding is correct, i need to route the success relationship of PutS3 processor to the execute groovy script, followed by update attribute to update the flow file name and then change the content of the flowfile with the required content, this may be silly questions please bear with me as I am new to Nifi – Vasanth Kumar Dec 09 '20 at 13:41
  • It would be better to do something like this: PutS3Object -success--> RouteOnAttribute --unmatched--> ExecuteScript --success-->PutS3Object – Mike Thomsen Dec 09 '20 at 19:17
  • From ExecuteScript you can update all of the flowfile's attributes and content. No need to have a separate GenerateFlowFile step and figure out how to time it correctly with your main flow. – Mike Thomsen Dec 09 '20 at 19:18
  • Yeah thank you Mike, I got a simple solution for this I have added a funnel before the put s3 object, and upstream of the funnel will receive two file, one with the extract and the other with the file name, down stream of the funnel is connected to the puts3 object, so this will load both the files at the same time – Vasanth Kumar Dec 10 '20 at 14:04
0

I got a simple solution for this I have added a funnel before the put s3 object, and upstream of the funnel will receive two file, one with the extract and the other with the file name, down stream of the funnel is connected to the puts3 object, so this will load both the files at the same time