I'm spinning up a Windows Container image (provided my MS here) containing the Self-Hosted Integration Runtime to be able to use ADF in a on-premise situation. It ran smoothly until I needed to use Parquet files.
When I pointed the output to a .parquet I got a Data Factory task failure pointing the absence of Java in the Integration Runtime container.
ErrorCode=JreNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Java Runtime Environment cannot be found on the Self-hosted Integration Runtime machine. It is required for parsing or writing to Parquet/ORC files. Make sure Java Runtime Environment has been installed on the Self-hosted Integration Runtime machine.,Source=Microsoft.DataTransfer.Common,''Type=System.DllNotFoundException,Message=Unable to load DLL 'jvm.dll': The specified module could not be found.
I took the path of modifying the build.ps1 file to install and configure the dependencies during container image creation. These are the steps taken:
Install Microsoft Visual C++ 2010 Service Pack 1 (here)
Instal JDK provided from Microsoft OpenJDK 17.0.6 LTS - 64bits MSI (here)
Manually set JAVA_HOME environment variable:
setx -m JAVA_HOME "C:\Program Files\Microsoft\jdk-17.0.6.10-hotspot"
(As for as I got SHIR will look in the registry for JRE location and in case it is not found it will look JAVA_HOME env var).
Java seems to be working fine, since when I run java -version
it returns me the following output.
openjdk version "17.0.6" 2023-01-17 LTS
OpenJDK Runtime Environment Microsoft-7209853 (build 17.0.6+10-LTS)
OpenJDK 64-Bit Server VM Microsoft-7209853 (build 17.0.6+10-LTS, mixed mode, sharing)
Everything seems to be Ok but I keep getting the error I mentioned above. I tried to install JRE7, JRE8, configure registry keys, but nothing seems to work.