3

I am trying to copy data from on-prem SQL server to Azure Data Lake Storage (ADLS) via Azure Data Factory (ADF). Everything seems to work, except when I run (debug or trigger) the pipeline, I get the error:

{ "errorCode": "2200", "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorAdlsFileWriteFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Writing to 'AzureDataLakeStore' failed. Message: The remote server returned an error: (411) Length Required.. Response details: \r\nLength Required\r\n\r\n

Length Required

\r\n

HTTP Error 411. The request must be chunked or have a content length.

\r\n\r\n,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (411) Length Required.,Source=System,'", "failureType": "UserError", "target": "CopyData1" }

What is really odd, is that the following pipelines DO work:

  • SQL tbl1 -> SQL tbl2
  • ADLS source.txt -> ADLS sink.txt

I.e. read/write access works as expected. The latter pipeline is also able to create/overwrite the sink.txt file.

But when I run the pipeline

  • SQL tbl1 -> sink.txt

I get the Length Required error. And if sink.txt exists, the pipeline even deletes it!

I'm using ADFv2, ADLS Gen1, ADF & ADLS resides in the same subscription/resource group, using selfhosted/Azure Integration Runtime (for SQL / ADLS respectively). I have tested with source statement as simple as "SELECT 1 Col". Also tested without dataset schema, and with schemas+mappings.

Is this a bug, or am I missing something? Which “Length” is required?


EDIT 1: Minimal JSON scripts

pipeline1.json

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "CopyData1",
                "type": "Copy",
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "typeProperties": {
                    "source": {
                        "type": "SqlSource",
                        "sqlReaderQuery": "SELECT TOP 1 'x' AS col1 FROM sys.tables"
                    },
                    "sink": {
                        "type": "AzureDataLakeStoreSink"
                    },
                    "enableStaging": false,
                    "dataIntegrationUnits": 0
                },
                "inputs": [
                    {
                        "referenceName": "table1",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "sink1",
                        "type": "DatasetReference"
                    }
                ]
            }
        ]
    }
}

table1.json

{
    "name": "table1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "SqlServer1",
            "type": "LinkedServiceReference"
        },
        "type": "SqlServerTable",
        "typeProperties": {
            "tableName": "sys.tables"
        }
    }
}

sink1.json

{
    "name": "sink1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureDataLakeStore1",
            "type": "LinkedServiceReference"
        },
        "type": "AzureDataLakeStoreFile",
        "structure": [
            {
                "name": "col1",
                "type": "String"
            }
        ],
        "typeProperties": {
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "rowDelimiter": "",
                "nullValue": "\\N",
                "treatEmptyAsNull": true,
                "skipLineCount": 0,
                "firstRowAsHeader": true
            },
            "fileName": "sink1.txt",
            "folderPath": "myDir"
        }
    }
}

EDIT 2: Summary of conducted tests

  • SQL -> ADLS Error
  • Oracle -> ADLS Error
  • SQL -> Blob OK
  • Oracle -> Blob OK
  • SQL -> SQL OK
  • ADLS -> ADLS OK
  • AzureSQLDB -> ADLS OK
Martin Thøgersen
  • 1,538
  • 18
  • 33

1 Answers1

1

Does your self-hosted IR has some proxy setting or goes through special network setting? Such error should be caused by the intermediate proxy service when ADF's ADLS connector tried to talk to the ADLS service.

Yingqin
  • 195
  • 6
  • I am not aware of any special proxy or network settings (unless you can be more specific). The selfhosted IR is installed on the development machine, but with the same network as the on-prem servers. – Martin Thøgersen Nov 23 '18 at 14:41
  • You comment seems a bit contradicting. How can the source side (Selfhosted IR and possible proxy issue) affect the sink side (Azure IR and ADLS)? – Martin Thøgersen Nov 23 '18 at 14:56
  • [SOLVED]: Whitelist `azuredatalakestore.net` in the proxy between the Self-hosted IR and ADLS. (In our case it was McAfee Web Gateway blocking it.) A key step in the solution was to understand that once ADF activities have been initilized by ADF, the data/communication goes **directly from Self-hosted IR to ADLS**, not through ADF. We initially misunderstood this. [Self-hosted IR](https://learn.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#ports-and-firewall) – Martin Thøgersen Dec 05 '18 at 11:41