0

I have to load the data from datalake into a SQL Server data warehouse using the polybase tables. I have created the set up for the creation of external tables. I have created the external tables and I am trying to do select * from ext_t1 table but I'm getting ???? for a column in ext_table.

Below is my external table script. I have found the issue with the special character in data. How can we escape the special character and need to use only varchar datatype not nvarchar. Can some help me on this issue?

CREATE EXTERNAL FILE FORMAT [CSVFileFormat_Test] WITH (FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = N',', STRING_DELIMITER = N'"',DATE_FORMAT='yyyy-MM-dd', FIRST_ROW = 2, USE_TYPE_DEFAULT = True,Encoding='UTF8'))

CREATE EXTERNAL TABLE [dbo].[EXT_TEST1]
( A VARCHAR(10),B VARCHAR(20)) 
(DATA_SOURCE = [Azure_Datalake],LOCATION = N'/A/Test_CSV/',FILE_FORMAT =csvfileformat,REJECT_TYPE = VALUE,REJECT_VALUE = 1)

Data: (special character in csv for A column as follows)

ÐК Ð’ÐЗМ Завод
ÐК Ð’ÐЗМ ЗаÑтройщик
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
pythonUser
  • 183
  • 2
  • 7
  • 20

1 Answers1

-1

This is data mismatch issue and this read may help you .

External Table Considerations

Creating an external table is easy, but there are some nuances that need to be discussed.

External Tables are strongly typed. This means that each row of the data being ingested must satisfy the table schema definition. If a row does not match the schema definition, the row is rejected from the load.

The REJECT_TYPE and REJECT_VALUE options allow you to define how many rows or what percentage of the data must be present in the final table. During load, if the reject value is reached, the load fails. The most common cause of rejected rows is a schema definition mismatch. For example, if a column is incorrectly given the schema of int when the data in the file is a string, every row will fail to load.

Data Lake Storage Gen1 uses Role Based Access Control (RBAC) to control access to the data. This means that the Service Principal must have read permissions to the directories defined in the location parameter and to the children of the final directory and files. This enables PolyBase to authenticate and load that data.

HimanshuSinha
  • 1,650
  • 2
  • 6
  • 10