2

Input flat file is encoded with ISO 8859-1.

The database has default codepage 1252.

Is the only way to change codepages to use a Data Conversion transformation?

Yes, I know that 8859-1 and 1252 are very close, but I am trying to do this formally correct.

I must be missing something. It appears that I must use a Data Conversion transform, click on each and every field, and choose the codepage, then deal with the "Copy of " field names one by one. This does not seem believable.

Hadi
  • 36,233
  • 13
  • 65
  • 124
lit
  • 14,456
  • 10
  • 65
  • 119
  • This might be a solution using C#. Convert the whole file using one of these answers https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8. – KeithL Mar 07 '20 at 15:22

1 Answers1

0

When reading data from a flat-file there are different methods to convert the code page:

1- Saving the flat file with a different code page:

The easiest way is to open the flat file using a text editor and save it using another encoding. You may be able to do this using notepad (not sure if 1252 is supported) as an example:

2- Trying to change it from Flat File Connection Manager

Try to change the code page from the flat file connection manager, if implicit conversion is supported between code pages it may not affect the text.

3- Using a Derived column Transformation

You can add a derived column with the following expression:

(DT_STR,50,1252)[InputColumn]

4- Using a Data Conversion Transformation

As you mentioned you can use a data conversion transformation to convert the code page.

5- Use a staging table

You can import data to a table having Nvarchar columns, then use an SQL command to insert rows into the destination table (having varchar columns)

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • Thank you for your reply. 1) The files are many GiB in size so using an editor is not possible. There are encoding conversion tools such as iconv, but I do not think this organization will permit its use. 2) The only encoding on the Flat File Connection Manager sets the incoming encoding. I could set it to 1252, but the input is actually 8859-1. – lit Mar 06 '20 at 22:49
  • 3 and 4) Using either a Derived Column and Data Conversion transform both appear to clicking one-by-one on every field and entering the custom expression for each. Is there any way to edit a text representation to be used other than the XML of the .dtsx file? I am reluctant to directly edit the .dtsx file. – lit Mar 06 '20 at 22:52
  • @lit I don't think there is another option. Or you have to build the package dynamically using BIML or Ezapi or DTS wrappers assemblies. Or simply try to import these data into a staging table where columns type is nvarchar, then import them to the destination table using a SQL query – Hadi Mar 06 '20 at 22:56
  • The current files for this project have 110, 182, 58, 110, 278, and 52 fields. Of course, this is only one project. – lit Mar 06 '20 at 22:56
  • @lit I think a staging table approach is more preferable in this case – Hadi Mar 06 '20 at 22:57
  • I am reluctantly going to mark this as the answer. Not because I like it, but because I fear that it is correct. I would like for someone to convince me that Microsoft understands what needs to be done to do DI, ETL, etc. – lit Mar 06 '20 at 23:04