In SQL Data Warehouse (editors please don't change this, it is the actual name see: here) I have a JobCandidate_ext
external table that looks like this.
CREATE EXTERNAL TABLE [HumanResources].[JobCandidate_ext](
[JobCandidateID] int,
[BusinessEntityID] int,
[Resume] Varchar(8000),
[ModifiedDate] Datetime
)
WITH (
LOCATION='/[HumanResources].[JobCandidate]/data.txt',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile)
GO
The column [Resume]
was an XML
type in SQL Server but in SQL Data Warehouse XML types should be converted to varchar(8000)
as described here.
I am using a flat file data.txt
to export the data to a blob and then create an external table from it.
The [Resume]
column has carriage returns in it (as expected from an XML file), and so when you run a SELECT * FROM [HumanResources].[JobCandidate_ext]
you get an error. In this case:
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 2 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 0, Expected data type: INT, Offending value: some text .... (Column Conversion Error), Error: Error converting data type NVARCHAR to INT.
I know that I cannot configure a row delimiter when creating external tables as described here.
The row delimiter must be UTF-8 and supported by Hadoop’s LineRecordReader. The row delimiter must be either '\r', '\n', or '\r\n'. These are not user-configurable.
And if you try to put quotes on each column field you get this error while selecting rows from the external table: No closing string delimiter
.
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 2, Expected data type: VARCHAR(8000) collate SQL_Latin1_General_CP1_CI_AS, Offending value: 'ShaiBassli (Tokenization failed), Error: No closing string delimiter.
Is there a way to get around this issue?