-2

I'm trying to load the flow files into MySQL database using bulk load option. Below is the query I'm using as part of the UpdateAttribute processor and passing that query to PutSQL after updating the parameters to do bulk load.

LOAD DATA INFILE '${absolute.path}${filename}' INTO TABLE ${dest.database}.${db.table.name} FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'

When I ran the flow it's failing saying file not found exception.

. There were a total of 1 FlowFiles that failed, 0 that succeeded, and 0 that were not execute and will be routed to retry; : java.sql.BatchUpdateException: Unable to open file 'data.csv'for 'LOAD DATA INFILE command.Due to underlying IOException:`

** BEGIN NESTED EXCEPTION ** 

java.io.FileNotFoundException
MESSAGE: data.csv (No such file or directory)
java.io.FileNotFoundException: data.csv (No such file or directory).

Here MySQL Server and Nifi are on different nodes so I can't use LOAD DATA LOCAL INFILE query.

I'm not sure why I'm getting file not found exception even though I mentioned the complete absolute path of the flow file in the SQL Query.

When I use query with hard coded file name and providing the absolute path of the file in nifi node, it's working as expected.

Working:

LOAD DATA LOCAL INFILE '/path/in/nifi/node/to/file/data.csv' INTO TABLE ${dest.database}.${db.table.name} FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'}

Question is how to get the absolute path of the flow file and load the same flow file into mysql.

Flow:

enter image description here

data_addict
  • 816
  • 3
  • 15
  • 32
  • as i know, the flowfile is not present on disk as a plain file. even if you'll get the path to storage on disk, it could contain several flow files or part of one flow file. so, you have to use PutFile to store flowfile to a desired path, and then call LOAD DATA LOCAL... command with the same path. – daggett Mar 21 '18 at 17:56
  • @daggett, I tried that approach suggested by you, it working as expected, however writing the flowfile to disk seems to be a costly operation. As I have huge number of flowfiles and each flowfile is bigger in size it's hitting performance a lot. – data_addict Mar 21 '18 at 19:19
  • what if you split csv file by smaller sizes(for example 100k) and configure sequence so, that while PutFile stores physical file, the another part is executing with LOAD DATA LOCAL ? – daggett Mar 21 '18 at 20:56
  • @daggett, if the split size 100k records, as there are billions of records in the table, I might get huge number of flow files and writing those flow files to disk will be costly operation. – data_addict Mar 22 '18 at 03:54

1 Answers1

0
  • Stop the PutSQL processor and let the flowfiles queue up.
  • Once they are queued up, right click on the success relationship
    between UpdateAttribute and PutSQL and select List Queue.
  • Select any one flowfile and navigate to the Attributes tab and see if the attributes absolute.path and flowfilename exists and if
    they do exist, verify if they have the expected value set. In your case absolute.path should have the value /path/in/nifi/node/to/file and flowfilename should have the value /data.csv

Question for you: Are you setting these attributes yourself using UpdateAttribute, reason is, NiFi doesn't generate an attribute named flowfilename, it generates one with the name filename.

One more thing, make sure either the value for absolute.path ends with a / in the end or the value of flowfilename begins with a /. If not, they will be appended and the result will be /path/in/nifi/node/to/filedata.csv. You can try the append function that @Mahendra suggested, else you can simply use ${absolute.path}/${flowfilename}.

Update

I just realized that absolute.path is a core attribute like filename, filesize, mime.type, etc. Some processors use all the core attributes while some use very few which are needed. GenerateTableFetch writes absolute.path but doesn't set anything for it. That's why it has ./ which is the default value.

So my suggestion for your approach to work is, you can manually set/overwrite absolute.path attribute using UpdateAttribute (just like you have overwritten filename) and set the desired value which is /path/in/nifi/node/to/file

Sivaprasanna Sethuraman
  • 4,014
  • 5
  • 31
  • 60
  • For readability purpose I gave name as flowfilename, however Nifi is generating filename and in my query also I'm using filename. File name attribute have value data.csv and absolute.path attribute has value ./ and when I'm using ${absolute.path}/${filename} or ${absolute.path}${filename}, I'm getting filenotfound error. – data_addict Mar 21 '18 at 05:38
  • Let me get one thing clear: this `absolute.path` is manually created by you using `UpdateAttribute`? Or is it being generated by some NiFi file based processors and you are trying to use that ? It would also help, if you can export the flow and share it here. – Sivaprasanna Sethuraman Mar 21 '18 at 07:15
  • Nifi is generating the path attribute(value is / ) and when I use ${absolute.path} in my query and when I check the Nifi attributes i${absolute.path} value it's showing as ./ – data_addict Mar 21 '18 at 07:21
  • Which processor is generating that attribute? – Sivaprasanna Sethuraman Mar 21 '18 at 09:20
  • Generatetable fetch is the processor generating the SQL statements for Execute SQL and in that processor path is getting generated. – data_addict Mar 21 '18 at 09:41
  • I highly doubt that. `GenerateTableFetch` doesn't write that attribute. Path specific attributes are usually written by file based processors. Also please check https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.GenerateTableFetch/index.html You can see `absolute.path` is not present in the list of attributes this processors write. – Sivaprasanna Sethuraman Mar 21 '18 at 09:51
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167249/discussion-between-user805-and-de-santa). – data_addict Mar 21 '18 at 10:04