3

I am trying to load DAT file to sql server table .A particular column of DAT file having � in a particular column of the file .I am trying to replace this character using a expression in derived column transformation . Following is the expression .

(DT_STR,1000,1252) REPLACE( [Comments] ,N'�','') 

It is was throwing error ,expression is not getting evaluated . Can anybody help me here . Many thanks

Hadi
  • 36,233
  • 13
  • 65
  • 124
Velugoti
  • 59
  • 1
  • 6
  • Sounds like you're loading the physical file with the wrong encode. Are you, for example, trying to read a UTF-8 file but specifing in SSIS that it has a UTF-16 encode? – Thom A Mar 04 '19 at 16:40
  • 2
    � is the replacement character for when an invalid sequence of code points is read. It's often just a visual placeholder and within SSIS probably not actually `U+FFFD REPLACEMENT CHARACTER`, which would make it immune to your replacement call. If the encoding of the file is not just wrong, you may need to take a closer look at the bytes of the file with a hex viewer. – Jeroen Mostert Mar 04 '19 at 16:45

1 Answers1

0

I think you should try two approaches:

(1) Changing the file encoding

As mentioned in the comments, is a replacement of a invalid character in the selected encoding.

First, check that the column data type is DT_WSTR not DT_STR**

Also, try setting the encoding to Unicode or try using a encoding detection tool such as:

But i don't think until now there is a powerful encoding detection application or library where all encodings are supported

To change the flat file connection manager encoding, you can check the Unicode checkbox or select a Code Page from the drop down list:

enter image description here

(2) Removing unwanted characters

If you don't have a problem with encodings and you know the right file encoding, if the file creation is not done correctly, these characters have no meaning and cannot be reverted, you must check the file creation operation and check if there is a problem or difference between the data source encodings or collation and the flat file encoding.

Or you should remove these characters using a script component or derived column:

(DT_WSTR,1000)REPLACE([Comments] ,"�","") 

*Note: Don't use N literal before the mark and don't convert to DT_STR since it might be other unicode characters that cannot be converted.

Additional information

Hadi
  • 36,233
  • 13
  • 65
  • 124