0

Pipeline is supposed copying several tables from on-prem SQL Server to ADLS parquet files (drop and create during each processing). The intake relies on self-hosted integration runtime. All tests during the configuration are successful (i.e. table structure is retrieved successfully, connection tests are all green etc.). Pipeline however generates following execution error:

Activity Copy_DetailPricingLevelHierarchy failed: Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path Intake/MySource\PricingLevelHierarchy.,Source=Microsoft.DataTransfer.Common,''Type=System.InvalidOperationException,Message=Internal connection fatal error. Error state: 18,Source=System.Data,'

I'm failing to understand what is actually failing.

Validations and attempts to fix the issue:

  • Input and output connection validation from ADF (success)
  • Connection validation from IR VM (success)
  • Using Service Principal key to ADF directly to rule-out Azure Vault authentication issue (no change; connection validation remain successful)
  • Replacing the IR hosted on VM with auto set-up IR on my personal machine (upload successful)
  • Checking the outbound ports 80 and 443 are open on VM (success)

Please advise, what other configurations might be causing sink failure.

Dan
  • 494
  • 2
  • 14
  • Do you have a typo in your file path? The path in the error message has both a forward slash and a backslash: `Intake/MySource\PricingLevelHierarchy` – DavidP May 03 '19 at 17:37
  • @DavidP The path is correct - ADF JSON keeps folder path in one variable `"folderPath": "Intake/MySource"` and target file in another `"fileName": "PricingLevelHierarchy.parquet"`. The funny slash mismatch shows only in the error message. (Also the upload works fine once replacing VM-based IR with one installed on my machine). – Dan May 05 '19 at 02:19
  • I'd try swapping the slash type anyway just to be sure. Have you validated access to the on-prem SQL server from the VM using the same account but with a different tool such as SSMS? – DavidP May 06 '19 at 20:46
  • @DavidP Yes, access was tested - exactly same settings work as expected when using IR on different machine... The issue seemed to be caused by absence of Java on the VM! As soon as I finish the testing, I'll post the full answer here (I still can't believe it, because there is no Java mentioned in IR pre-requirements)… Thank you David for ideas and support! – Dan May 07 '19 at 21:45
  • 1
    Java isn't a pre-requisite for the IR, it's a pre-requisite for using the Parquet format. It's mentioned here: [ADF Supported File Formats: Parquet](https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#parquet-format) – DavidP May 08 '19 at 14:14
  • @DavidP That's what was missing. Please, do you want to post your comment as an answer? Installing Open JDK or paid JRE is the solution. Thank you! – Dan May 08 '19 at 19:09
  • Glad you found the issue. It's unfortunate that the error message does not provide clearer information about what caused the error. It might still be helpful for future visitors if you write up how you figured out that not having Java was what led to this specific error. – DavidP May 08 '19 at 20:45
  • I'm also facing the same issue, but my both source and target folders are in same ADLS container. So it does not seem to be IR/Java issue. Have plain txt files under several folders. Some folders gets copied, but for rest I'm getting same error. Any ideas? – Manoj Pandey Jan 08 '21 at 06:44

1 Answers1

3

To be able to use the Parquet format with a Self-Hosted Integration Runtime, you need to have a Java Runtime Environment (JRE) installed.

From the Microsoft documentation on Azure Data Factory supported file formats section on Parquet format:

For copy empowered by Self-hosted Integration Runtime e.g. between on-premises and cloud data stores, if you are not copying Parquet files as-is, you need to install the 64-bit JRE 8 (Java Runtime Environment) or OpenJDK on your IR machine.

Source link

DavidP
  • 613
  • 5
  • 12