2

I am trying to ingest the following .txt file with content

2|Jürgen
3|Jürgen
4|Jürgen

delimited by |. The 2nd column is accented letters, as I want to test sql table preserving accented letters.

The sql table to be ingested are defined as

CREATE TABLE [stage].[bcp_test](
[ID] [int] NOT NULL,
[TERM] [nvarchar](100) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL)

If I do a simple insertion, insert into stage.bcp_test values(1,'Jürgen') I can see the accent Jürgen preserved in the table.

But when I tried to insert the 3 rows in the sample file through bcp

bcp stage.bcp_test in "test.txt" -S dbservername -c -t "|" -T -F 1 -b 20000 -m 1 -e error.txt

Jürgen becomes J├╝rgen.

db screenshot

Based on some suggestions, I change -c to -t. So the new bcp command becomes

bcp stage.bcp_test in "test.txt" -S dbservername -w -t "|" -T -F 1 -b 20000 -m 1 -e error.txt

However, 0 rows are copied without any error message. error message

Is there anything that I missed? I tried to search different places but cannot find a solution. Is it okay for me to bcp a .txt file using bcp -w?

=========================== Regarding Pa1's suggestion,what I got in my sql server is as follow:

Pa1 answer

Slightly different from his post, maybe because of sql version?

My Sql Server Version:

Microsoft SQL Server 2016 (SP2) (X64)

===========

Add on: Pa1's solution works. Depends on how the file is encoded.

YanTianqi
  • 31
  • 4
  • https://stackoverflow.com/questions/9220450/using-bcp-to-import-data-to-sql-server-while-preserving-accents-asian-character Here links to a very old post suggesting using bcp -w to fix the accented letters issue. But there is not much follow-up suggesting whether the issues are resolved. – YanTianqi Mar 01 '19 at 09:13
  • Pa1's solution works, without use -w. But I am still a little confused a bit this -c and -w thing. Welcome answers. – YanTianqi Mar 01 '19 at 10:45

1 Answers1

3

This seems similar to character encoding issue with the BCP and ó

  1. -w expects tab separated data based on bcp documentation. So replace pipes with tab and try?

  2. To ensure that text is inserted properly, try specifying the code page in your bcp command

-C { ACP | OEM | RAW | code_page }
  1. If you are using the -C code_page option then you need to match the code_page to the encoding of the file you are trying to ingest. The complete list is available here

Here is the trial on my system:

  • For ANSI Text
bcp stage.bcp_test in "test.txt" -S dbservername -c -C ACP -t "|" -T -F 1 -b 20000 -m 1 -e error.txt

Results before and after using ACP

  • For UTF-8 encoded text
bcp stage.bcp_test in "test.txt" -S dbservername -c -C 65001 -t "|" -T -F 1 -b 20000 -m 1 -e error.txt
  • Using -w (change text to tab sperated)
bcp stage.bcp_test in "test.txt" -S dbservername -w -T -F 1 -b 20000 -m 1 -e error.txt
Pa1
  • 31
  • 1
  • 4
  • Hi, thanks for suggesting. using -c does work. However, the premise is to preserve the accented letters, while using -c simply ignores that requirement. So the discussion is still to use -w, the unicode bulk insertion. While keeping -w, adding a -C ACP did not solve my problems. – YanTianqi Mar 01 '19 at 10:25
  • Following on, oh, I see your point of using -c -C ACP. The result is better, but still slightly off, i.e. not preserving the exact accents. I cannot attach picture here.Will pose in my main question body with the results. – YanTianqi Mar 01 '19 at 10:31
  • Hi @YanTianqi. The sample file I used was using ANSI encoding. If the encoding is UTF-8 then you are right it won't work. For UTF-8 you need to specify `-C 65001`. Could this be the fix? – Pa1 Mar 01 '19 at 10:38
  • 1
    Hi Pa1, great, by using -c 65001, I manage to preserve the accented letters. Thanks. – YanTianqi Mar 01 '19 at 10:44