1

I have used Nifi-0.6.1 with combination of GetFile+SplitText+ReplaceText processor to split the csv data which has 30MB (300 000 rows).

GetFile is able to pass 30mb to SplitText very quickly.

In SpliText +Replace Text takes 25 mins to split the data into Json.

Just 30 mb data is taking 25 mins for store csv into SQL Server. It performs conversion byte by byte.

I have tried Concurrent Task option in Processor. It can able to speed but it also take more time. At that time it attain 100% cpu Usage.

How can I perform csv data into sql Server faster?

James Z
  • 12,209
  • 10
  • 24
  • 44
Mister X
  • 3,406
  • 3
  • 31
  • 72
  • 1
    *"3Lakh rows"*?! What does that mean? Also, there is the native [BULK INSERT](https://msdn.microsoft.com/en-us/library/ms188365.aspx) statement lo load CSV data into SQL Server. Maybe you try this first. – Tomalak Oct 14 '16 at 07:31
  • 1
    I can able to perform bulk insert in SQL Server only.But my case is fully concentration in Apache Nifi Processors. – Mister X Oct 14 '16 at 08:45
  • 1
    Trying to fix the question again. Please don't just rollback changes that try to make more sense to your really bad grammar / language. – James Z Oct 27 '16 at 17:34

2 Answers2

4

Your incoming CSV file has ~300,000 rows? You might try using multiple SplitText processors to break that down in stages. One big split can be very taxing on system resources, but dividing it into multiple stages can smooth out your flow. The typically recommended maximum is between 1,000 to 10,000 per split.

See this answer for more details.

Community
  • 1
  • 1
James
  • 11,721
  • 2
  • 35
  • 41
  • 1
    I have already using 4 split Text processors to split those rows but using single Replace text ,it take more time to get the row from it.For example:Data Input Size is 30MB After conversion into json it increased to 70MB.So move that json into SQLServer take long time – Mister X Oct 17 '16 at 03:57
3

You mention splitting the data into JSON, but you're using SplitText and ReplaceText. What does your incoming data look like? Are you trying to convert to JSON to use ConvertJSONtoSQL?

If you have CSV incoming, and you know the columns, SplitText should pretty quickly split the lines, and ReplaceText can be used to create an INSERT statement for use by PutSQL.

Alternatively, as @Tomalak mentioned, you could try to put the CSV file somewhere where SQLServer can access it, then use PutSQL to issue a BULK INSERT statement.

If neither of these is sufficient, you could use ExecuteScript to perform the split, column parsing, and translation to SQL statement(s).

mattyb
  • 11,693
  • 15
  • 20
  • My incoming data is CSV File.i have using 4 split text &one replace text.Yes i have convert those data into Json using CovertJsonToSQL.PutSQL only having insert,Update only not Bulk Insert Statement.If anything i am doing wrong please update me. – Mister X Oct 17 '16 at 04:01
  • PutSQL will execute any statement (except callable statements) that does not return a result set. So you can issue INSERT, UPDATE, CREATE TABLE, BULK INSERT, etc. You wouldn't want to use SplitText or ConvertJsonToSQL if you were going to use BULK INSERT. Instead you'd stage the CSV file so the DB can get to it, then send a BULK INSERT statement via PutSQL to ingest it. – mattyb Oct 18 '16 at 13:08