3

I have a batch file that imports a delimited text file into SQL Server using BCP, runs some processing using SQLCMD, and then outputs the processed data via BCP. It currently gives a string of Chinese characters instead of a delimited text table. The weird thing is that it was working two days ago and I haven't changed anything.

I've confirmed that the input BCP and SQL Server processing steps work; the SQL Server table that holds the data for output is correct. I also tried using the -C option to specify a codepage as suggested here, with no change.

Here's the BCP code in question. I've split the lines for readability, the actual code is on one line as normal. It's using union all to put the column names at the top of the file.

bcp 
"select
    'MRN'
    ,'column name 2'
    ,[rest of the column names]
union all 
select distinct 
    iif(r.MRN is not null, cast(r.MRN as varchar), '') as MRN 
    ,[rest of the columns]
from DATA_MANAGEMENT.dbo.Daily_inpat_out as d 
left join DATA_MANAGEMENT.dbo.Roster_all_match as r 
    on d.Subscriber_ID = r.SUB_ID 
where 
    r.MO_DLGTN_STA_DESC = 'DELEGATED'" 
queryout "L:\Data_Management\Import_data\Daily_inpat\inpat_out_%mydate%_%mytime%.txt"
-c -S [servername] -U [user] -P [password] -t "|"

Here's the first part of the Chinese character output, the full string is much longer:

前籎潇瑶偟潲牧浡䝼偒䥟籄䕍䉍剅也䵁籅䥂呒彈䅄䕔卼扵捳楲敢彲䑉呼彘䕓呔义彇䕄䍓

Edit to add:

As mentioned in the comments the problem occurs in Notepad but displays correctly in other applications (Word, Excel, Notepad++), so this is an encoding issue. I'm am still hoping to understand the cause of the issue and how to fix it.

Rominus
  • 1,181
  • 2
  • 14
  • 29
  • What are you using to view the output data? This is very similar to a problem a user had where his browser started displaying his opening page (the Chrome start page) as all Chinese characters. The solution? "The issue occurred when I switched chromes encoding method to Unicode UTF-16LE. Switching back to Western (ISO-8859-1) solved the issue." This is likely a similar encoding issue. – Laughing Vergil Jul 21 '17 at 15:57
  • Nah, this is in Notepad – Rominus Jul 21 '17 at 16:05
  • Just checked it in Word and Excel and...they show it right. What the...? – Rominus Jul 21 '17 at 16:07
  • I remain committed to encoding being the issue somewhere. One interesting possibility is raised by this comment: "Since text files have no metadata the encoding when opening a file is detected by an educated guess. So if you write "bush hid the facts" and save it as ANSI, when you open it again in Windows XP/NT/2000 it will display chinese characters because it thinks it's unicode". A similar issue could be happening here. Try getting Notepad++ or another more intelligent editor, and displaying the file as ASCII text. – Laughing Vergil Jul 21 '17 at 16:13
  • Aaand... nailed it. – Laughing Vergil Jul 21 '17 at 16:14
  • @LaughingVergil so yeah, encoding was the issue, but how would you fix the batch file so it outputs correctly? My other batch files output Notepad-readable files still, and Notepad is displaying other text files just fine. – Rominus Jul 21 '17 at 16:16

2 Answers2

4

This is caused by a coincidental arrangement of bytes at the front of the text file that convinces Notepad that the data is actually an Unicode double-byte text file. Theoretically, you could place a set of characters there that convince the parser that the data is actually ASCII, but that would change the format of the output.

Your best bets are, in order of ease of implementation:

  • Add one or two spaces to the front of each line [Not guaranteed]
  • Use a different program to view the output
  • Output the data as HTML, and use a browser to view it
  • Take the output as is, and use an app to make it into a .pdf file
  • Use a reporting application to generate output files

There is no guaranteed way to prevent this from happening when using Notepad to view text files. The best you can do is to try and minimize the issue, or use a different format for the final output.

Laughing Vergil
  • 3,706
  • 1
  • 14
  • 28
  • 1
    Luckily I just use Notepad to check the output, the actual use of the file occurs in Excel, mainly. I guess I'll write it off and not worry about it. – Rominus Jul 21 '17 at 16:56
0

I was experiencing the same issue (also had a union between the data sets) but was building up my SQL statements

In my case i had selected '' as SomeField for known columns where there would be no data, in the top table satisfy the union.

I changed to NULL as SomeField, and changed my isnull checks to ISNULL(field,char(32))

SET @SQL = @SQL +'ISNULL('+@Cols+ ',CHAR(32))' + ' as '+@Cols+','

the bcp output file was correct.

I suspect in the above case its the

iif(r.MRN is not null, cast(r.MRN as varchar), '') as MRN

that is causing the encoding issue, perhaps iif(r.MRN is not null, cast(r.MRN as varchar), CHAR(32)) as MRN would have fixed the issue.

F. Müller
  • 3,969
  • 8
  • 38
  • 49