9

I'm trying to insert some data into a table from a csv document which has all of the fields delimited with ""

ie.

 APPLICANTID,NAME,CONTACT,PHONENO,MOBILENO,FAXNO,EMAIL,ADDR1,ADDR2,ADDR3,STATE,POSTCODE
 "3","Snoop Dogg","Snoop Dogg","411","","","","411 High Street","USA 
 ","","USA", "1111" "4","LL Cool J","LL Cool J","","","","","5 King
 Street","","","USA","1111"

I am using an xml format file to try and overcome the "" delimiters as I believe I would have to update the data again after importing to remove the inital " if it did not.

My format file looks like the following:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="NCharTerm" TERMINATOR='",' MAX_LENGTH="12"/>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="8" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="9" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="10" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="12" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="Latin1_General_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="APPLICANTID" xsi:type="SQLINT"/>
  <COLUMN SOURCE="2" NAME="NAME" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="9" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="10" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="STATE" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="12" NAME="POSTCODE" xsi:type="SQLCHAR"/>
 </ROW>
</BCPFORMAT>

and I am running the import with the following:

BULK INSERT [PracticalDB].dbo.applicant 
FROM 'C:\temp.csv'
WITH (KEEPIDENTITY, FORMATFILE='C:\temp.xml', FIRSTROW = 2)

I am getting the error:

Msg 4864, Level 16, State 1, Line 1 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (APPLICANTID).

for all of the rows.

I have tried various different combinations for the terminator including using:

TERMINATOR="&quot;,"
TERMINATOR="\","
TERMINATOR='","
TERMINATOR='\","

and none of them seem to work.

Is there a correct way to escape the " so that it will be parsed correctly, assuming that that is my problem here.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Daniel Powell
  • 8,143
  • 11
  • 61
  • 108

3 Answers3

21

Ok so I figured it out!

You can use ' instead of " when you are defining the xml attributes ie TERMINATOR='', then you can use the " within them without worrying.

Also I needed to eat the first " with a field so the other columns could be parsed correctly. This ended up with the format file

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="5" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="6" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="7" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="8" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="9" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="10" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="12" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="13" xsi:type="CharTerm" TERMINATOR='"\r\n' />
 </RECORD>
 <ROW>
  <COLUMN SOURCE="2" NAME="APPLICANTID" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="NAME" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="9" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="10" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="12" NAME="STATE" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="13" NAME="POSTCODE" xsi:type="SQLCHAR"/>
 </ROW>
</BCPFORMAT>

Where the first field is just a throw away one to remove the first " and the other fields all separate on "," and the final separates on "(newline)

Daniel Powell
  • 8,143
  • 11
  • 61
  • 108
  • Thanks! This is the best solution I have found to a very common problem. I've created a similar xml file for my import and it works perfectly. – Derek Tomes Mar 18 '12 at 22:30
2

Tip: if only some of the fields are doubleqouted, then use the openrowset version of the bulk insert, and doing so, you can manipulate the field content coming from the input file before inserting into the target table.

In the manipulation you can do anything with the field content, e.g. removing double-quotes. The effect on the performance is not mentioned here, I have no measures regarding this.

SchmitzIT
  • 9,227
  • 9
  • 65
  • 92
Estevez
  • 1,014
  • 8
  • 13
1

Tip: if your CSV file don't have consistent format, for example ON THE SAME COLUMN some of the values are doubleqouted and some not than this blog will help you do it in an easy way (this is a continue to Estevez's tip as using the openrowset is just the last step) http://ariely.info/Blog/tabid/83/EntryId/122/Using-Bulk-Insert-to-import-inconsistent-data-format-using-pure-T-SQL.aspx