How can I ensure that a BULK INSERT will insert my flat file's rows in order?

Question

I have a flat file composed by several rows of text, with no field separators (I parse them by position) and with newlines as row separators. Basically, something like this:

FIELD1AFIELD2AFIELD3AFIELD4A...
FIELD1BFIELD2BFIELD3BFIELD4B...
FIELD1CFIELD2CFIELD3CFIELD4C...

I'm bulk-inserting it in a temp table before parsing it to its final table, using this command:

BULK INSERT TEMPTABLE
FROM 'c:\location\file' 
WITH (FIELDTERMINATOR= '',ROWTERMINATOR = '\n',MAXERRORS = 0)

Now, the first and last rows serve as header and trailer for the file, and contain special information. As such, they don't follow the same pattern of "FIELD1AFIELD2AFIELD3AFIELD4A...", so I need to take them into account. Furthermore, I need to insert the header row's information in another table specifically.

But when I run the bulk insert, the rows aren't inserted in order in the temporal table, so I can't just assume that the top register will be the header, and the last register the trailer.

Is there a way to force the bulk insert ot follow the order of the rows in the flat file? I don't have a "key column", nor an index, nor anything in the file, and inserting row numbers in this file is impossible (I need to keep it as it was received, and it's too large to copy or modify it anyway).

Thanks

"*But when I run the bulk insert, the rows aren't inserted in order*" How do you know this? I ask because `SELECT` isn't necessarily going to return them in the order they were inserted. — RBarryYoung, Jun 09 '16 at 13:07
What about create an identity column on your temp table? Bulk insert will insert it in order of the file, Your problem is your select that MAY not bring it in the order, so you add an identity column and select ordering by it. — Jorge Campos, Jun 09 '16 at 13:07
@RBarryYoung: well, that's the problem. When I make the transition from the temporal table to the final table, I do it through and INSERT INTO...SELECT. Also, in needing the first row for a header, I do a SELECT TOP 1 * FROM TEMPTABLE, so I know that that query isn't returning the header row — Heathcliff, Jun 09 '16 at 13:24
@JorgeCampos: if I add an identity column in the temp table, I get this error: Msg 4866, Level 16, State 1, Line 1 The bulk load failed. The column is too long in the data file for row 1, column 54. Verify that the field terminator and row terminator are specified correctly. Msg 7399, Level 16, State 1, Line 1 The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error. Msg 7330, Level 16, State 2, Line 1 Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)". — Heathcliff, Jun 09 '16 at 13:55
Most probably because it tried to insert the first value into the ID column. You have to use a format file, see this thread: http://stackoverflow.com/questions/14711814/ignore-certain-columns-when-using-bulk-insert — Jorge Campos, Jun 09 '16 at 14:56

RBarryYoung · Accepted Answer · 2016-06-09T13:29:42.830

4

As I recall (haven't done it in a while) here are the steps that work for me:

Don't use a Temp Table, instead use a permanent table (name Temp_<something> if you want) that you just DELETE/TRUNCATE right before you use BULK INSERT.
Alter Temp_<something> to add an ID column and make it the clustered Primary Key for this table.
Create a View:

.

CREATE VIEW Temp_something_View As
    SELECT
          (every column EXCEPT the ID column)
    FROM Temp_something

GO

.

Now BULK INSERT to the View instead of the table.

Now the rows with the lowest and highest ID values should be the first and last lines of the input source.

When you need the rows in their source input order, just Select from the table using ORDER BY ID. Otherwise it will return them according to some physical allocation constraint, rather than in the logical order you want.

edited Jun 09 '16 at 13:29

answered Jun 09 '16 at 13:17

RBarryYoung

55,398
14
96
137

if I add that column in the temporal table (as a permanent talbe) and then create the view, when I do the bulk insert I get this message: "Msg 2601, Level 14, State 1, Line 1 Cannot insert duplicate key row, the dublicate key is " – Heathcliff Jun 09 '16 at 14:05
OK, if I define the ID column as primary key with an identity, though, it does work. I'll mark your answer as the correct one, thanks. Now, I also wanted to ask: is there a performance penalty on using this approach instead of your recommended one with a clustered index? Thanks again – Heathcliff Jun 09 '16 at 14:10
@Heathcliff Not sure. Try it and see. – RBarryYoung Jun 09 '16 at 15:54
1

This is a fantastic method to avoid format files (particularly sinve BULK INSERT can't use a format file). However, can _ANYONE_ confirm that adding an IDENTITY will correctly mark the rows in order. I can't find anything authoritative that confirms this. – Nick.Mc Aug 23 '17 at 06:55
1

@Nick.McDermaid I have never seen any confirmation from Microsoft that this is guaranteed to assign the IDs in order. However, it has always worked for me (over 15 years doing it now) and the only thing that I can think of that might cause it to fail would be if it did parallel INSERTs, so I would say, just make sure that it isn't doing that. A long as there is an index on the table, that should prevent any parallel inserting. Also, although I have never tried it, the doc does say that BULK INSERT can use a FormatFile. – RBarryYoung Aug 24 '17 at 15:46
@RBarryYoung thanks. That gives me some peace of mind. – Nick.Mc Aug 24 '17 at 21:33
It probably will keep the order but it's not guaranteed. https://sqlblog.org/2008/12/16/sql-server-does-not-care-about-the-order-of-your-bulk-file – VenerableAgents Oct 03 '17 at 19:54
1

@VenerableAgents Note that what Aaron was testing is specifically *not* what I outline above. I have a clustered index on the *identity* column. I have done tests like Aaron's many times on this configuration and as long as you don't try to add any parallelism to the BULK INSERT, it works. As I noted in my answer, this is not guaranteed by Microsoft, but it's hard to see why it wouldn't work. Using an alternate clustered index as Aaron is doing would obviously give the importer a reason to re-order the rows. – RBarryYoung Oct 04 '17 at 15:48

JustBeingHelpful · Answer 2 · 2021-04-15T09:32:32.760

0

Make sure you're selecting data in the proper order from your source data set.

edited Apr 15 '21 at 09:32

answered Apr 15 '21 at 08:23

JustBeingHelpful

18,332
38
160
245

How can I ensure that a BULK INSERT will insert my flat file's rows in order?

2 Answers2