Dataflow error - Data source cannot be converted

Question

We have a dataflow that worked fine for the last 30 days, and failed last night. The error message from the pipeline reads:

{
    "StatusCode": "DFExecutorUserError",
    "Message": "Job failed due to reason: The given value of type NVARCHAR(255) from the data source cannot be converted to type nvarchar(100) of the specified target column.",
    "Details": "shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: The given value of type NVARCHAR(255) from the data source cannot be converted to type nvarchar(100) of the specified target column.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.validateStringBinaryLengths(SQLServerBulkCopy.java:1690)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeColumn(SQLServerBulkCopy.java:3006)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeBatchData(SQLServerBulkCopy.java:3647)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1566)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:663)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jd"
}

The error message is not very precise, it does not list the column that is failing to meet the conditions. This makes the troubleshooting process a bit more difficult.

Both destination and source tables are hosted in the same Azure SQL Database. The destination table contains two columns with the length 100. The maximum length of the source fields is varchar(280) and varchar(290). The actual maximum length of the data inside these columns are 30 and 40 characters long.

--Source table
CREATE TABLE [pq].[USRS](
    [USRS_ID] [varchar](100) NULL,
    [USRS_FULL_NAME] [varchar](280) NULL,
    /* STRIPPED NON-RELEVANT COLUMNS */
    [USRS_EMAIL_ADDRESS] [varchar](290) NULL
    /* STRIPPED NON-RELEVANT COLUMNS */
) ON [PRIMARY]
GO

--Destination table
CREATE TABLE [fin].[l20_dim_gebruiker](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [gebruiker_id] [int] NOT NULL,
    [volledige_naam] [varchar](100) NULL,
    [emailadres] [varchar](100) NULL,
    /* STRIPPED NON-RELEVANT COLUMNS */
 CONSTRAINT [PK__l20_dim___3213E83F139E41DB] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

What could be causing this error message? How could we make the dataflow transformation more robust?

Can you share schemas for both your destination and source tables? The dataset schemas in ADF would be helpful too — ray, Nov 09 '21 at 09:20
Today, the pipeline failed again. It seems that it is a persistent problem. — Koen, Nov 10 '21 at 11:48

score 1 · Answer 1 · answered Nov 11 '21 at 11:23

1

Ok, so the error was on my side. The check that I did on the source table was not fully conclusive.

Select max(len(USRS_FULL_NAME))
, max(datalength(USRS_FULL_NAME))
From pq.USRS

Result from the query above:

+--------------------+---------------------------+
| USRS_FULL_NAME_LEN | USRS_FULL_NAME_DATALENGTH |
+--------------------+---------------------------+
|                 30 |                       250 |
+--------------------+---------------------------+

The LEN(GTH) function in T-SQL does not count the trailing spaces, see MSFT Docs or this workaround on SO.

My fix is now to trim the nvarchar fields using a DerivedColum modifier.

answered Nov 11 '21 at 11:23

Koen

475
4
17

Alternatively, one can always use `sp_help [dbo.]` in order to get further information from a given db.
– basquiatraphaeu Jan 24 '22 at 15:13
I am facing the same issue as of now and have no idea how to figure out which column is throwing this exception. Most of my `nvarchar` columns have the same `nvarchar(n)` and resizing all these columns seems not the way to go here. – basquiatraphaeu Jan 24 '22 at 16:24

Dataflow error - Data source cannot be converted

1 Answers1

Linked