-1

I've compressed the text file in gzip format using powershell and uploaded into Azure blob . When i query the external table i'm getting the following error but offending value is nothing . Can any one tell me what is the issue and how can i find out the error row .

Note : After issue i de-compressed the file and checked it , but i didn't not see any issue with rows .

Please click here to look at the error

  • Please can you provide more information, than in the question above. At least show the code for creating the external table, the external data source, and the external file format. Just a thought: have you defined the compression when you created the file format? – Niels Berglund Jan 28 '20 at 18:49

1 Answers1

0

My read on that error is that one row has a blank value for (I think) the first column. Since you have it declared as SMALLINT NOT NULL then it fails. Can you try changing that column to NULL?

Upon further troubleshooting I believe we determined the issue was special characters. I believe fixing the file encoding and removing special characters solved the issue.

GregGalloway
  • 11,355
  • 3
  • 16
  • 47
  • I tried with null and also varchar(250) null but again i'm getting same error . – praveen pasumarthi Jan 31 '20 at 16:58
  • When i change the column data type to varchar(250) null , it shows following error . Instead of small int it showing varchar(250) . Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 23294 rows processed. (/DB000B/DB000B_09_1_2019.txt.gz)Column ordinal: 0, Expected data type: VARCHAR(250) – praveen pasumarthi Jan 31 '20 at 17:07
  • @praveenpasumarthi please include the definition of the external table and external file format – GregGalloway Jan 31 '20 at 17:22
  • CREATE EXTERNAL TABLE DB000B_Trim_Ext_prod ( Column1 smallint NOT NULL, Column2 char(10) NOT NULL, Column3 smallint NOT NULL, Column4 datetime NOT NULL, Column5 int NULL, Column6 int NULL, Column7 int NULL, Column8 datetime NULL, Column9 char(1) NULL, Column10 char(1) NULL, Column11 money NULL, Column12 varchar(30) NULL, Column13 char(1) NULL, Column14 char(1) NULL, Column15 varchar(35) NULL, Column16 bigint NULL ) WITH (DATA_SOURCE = AzureStorage,LOCATION = N'/DB000B/DB000B_09_1_2019.txt.gz',FILE_FORMAT = TextFile_GzipComp,REJECT_TYPE = VALUE,REJECT_VALUE = 0) GO – praveen pasumarthi Feb 03 '20 at 12:12
  • CREATE EXTERNAL FILE FORMAT [TextFile_GzipComp] WITH (FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = N'|~|', DATE_FORMAT = N'yyyy-MM-dd HH:mm:ss.fff', USE_TYPE_DEFAULT = False), DATA_COMPRESSION = N'org.apache.hadoop.io.compress.GzipCodec') – praveen pasumarthi Feb 03 '20 at 12:15
  • @praveenpasumarthi I would change all columns to NULL and try again. If that doesn’t do it then I would try rejected row location to send any problem rows to a new file for troubleshooting. https://azure.microsoft.com/en-us/blog/load-confidently-with-sql-data-warehouse-polybase-rejected-row-location/ – GregGalloway Feb 03 '20 at 12:57
  • I kept all columns null but still error coming . Currently we using sql server 2016 which doesn't have REJECTED_ROW_LOCATION option . – praveen pasumarthi Feb 04 '20 at 10:23
  • @praveenpasumarthi oh. I see. Then I would try the following experiments. I would insert a few rows into the external table then go find the file it output, ungzip it, and compare the file carefully. Is it a different encoding UTF-8 vs 16 or ANSI. I have definitely seen crazy formatted characters break Polybase. And line breaks in a field value break Polybase. Then I would edit the problem file and reduce the number of rows until you figure out which is the problem row. It could be the first one. If it’s not sensitive data or if you can mock up a file with fake data feel free to post it – GregGalloway Feb 04 '20 at 12:34
  • 1
    Thanks for your input . I figured out the issue . Some formatted characters are presented in the data which breaks the polybase . Thank you very much – praveen pasumarthi Feb 04 '20 at 16:42