1

I have billions of rows in SQL Server tables and I'm using Nifi to load those rows into MySQL. I tried using PutSQL and PutDatabaseRecord both are satisfying the requirement, however they are taking quite long time to load the into MySQL(100k records per minute, for 1 billion it might be 1000+ minutes) as they are doing it record by record. Do we have any bulk load option to load the AVRO/CSV flowfiles into MySQL in Nifi.

Here is my Flow :

ListDataBaseTables -> GenerateTableFecth(partition size= 50k records(flowfile have 50k records)) -> Execute SQL -> ConvertRecord(Avro to CSV) -> PutSQL

ListDataBaseTables -> GenerateTableFecth(partition size= 50k records) -> Execute SQL(Avro) -> PutDatabaseRecord.

Any suggestions please.

data_addict
  • 816
  • 3
  • 15
  • 32
  • are you using `Batch Size` in `PutSQL` ? – daggett Mar 21 '18 at 07:52
  • probably you can use [LOAD DATA INFILE](https://dev.mysql.com/doc/refman/5.7/en/load-data.html) MySQL command through `ExecuteGroovyScript`. – daggett Mar 21 '18 at 07:54
  • @daggett, I tried that approach LOAD DATA INFILE, however facing the file not found error. https://stackoverflow.com/questions/49380307/flowfile-absolute-path-nifi/49398500?noredirect=1#comment85805848_49398500. Could you please have a look at link. – data_addict Mar 21 '18 at 09:38
  • @user805 I have updated my answer in the original question with a new approach. Let me know, if that works for you. – Sivaprasanna Sethuraman Mar 21 '18 at 17:21
  • @SivaprasannaSethuraman, getting the path of the flowfile is not working as flowfile is storing in the content-repository and there are multiple directories under content-repository. – data_addict Mar 21 '18 at 19:30
  • I'm not asking to use the path of the content repository. Based on your flow, you are trying to read data using `GenerateTableFetch` (SQL Server) and wants to bulk load that data to MySQL. You can introduce and intermediary processor, say, `PutFile` which will write the result of the `GenerateTableFetch` to the path provided in `PutFile`. In the next `UpdateAttribute`, create a `path` attribute (name can be anything) and set the directory. – Sivaprasanna Sethuraman Mar 22 '18 at 01:39
  • @SivaprasannaSethuraman, I tried that approach suggested by you, it working as expected, however writing the flowfile(PutFile) to disk seems to be a costly operation. As I have huge number of flowfiles and each flowfile is bigger in size it's hitting performance a lot. Any other possible solution? – data_addict Mar 22 '18 at 03:51
  • Unfortunately, no other possible solution using NiFi, AFAIK. Is that a requirement that it has to be done through NiFi? There are other migration tools that helps you move/migrate data from one RDB to another – Sivaprasanna Sethuraman Mar 22 '18 at 05:04
  • Okay, thanks for your suggestions @SivaprasannaSethuraman – data_addict Mar 22 '18 at 05:07
  • Or you can do one thing: Instead of using `ExecuteSQL`, use `PutDatabaseRecord` which will read the flowfile content as records and then ingest those records into the destination RDB table which is MySQL but I haven't tried it so can't say how much performance gain you will get compared to writing to file and loading. – Sivaprasanna Sethuraman Mar 22 '18 at 05:13
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167308/discussion-between-user805-and-sivaprasanna-sethuraman). – data_addict Mar 22 '18 at 05:15

0 Answers0