We have an Azure Data Factory with around 40 pipelines in it, each pipeline has an activity to run a hive script using a HDInsight on demand linked service.
We've since added a custom metastore which uses an azure SQL instance as a storage linked service. The only thing we have set to achieve this is HcatalogLinkedServiceName
in a data factory ARM script.
Since doing this we have noticed a rise in activity failures and our performance has degraded (this may have been due to the Azure SQL tier we were using previously which was spiking to 100% DTU usage.)
The activities are failing with Error in Activity: The request was aborted: The request was canceled..
This seems to be fairly consistent when using a custom metastore and doesn't seem to be an issue when using the default.
Is there anything further required to configure a custom metastore which we may have missed, or anything about our configuration/usage described here which could indicate why we are experiencing a significant rise in failures with the above error.
HDI OnDemand OS is linux, version is 3.2.
We also had some more severe errors, but I think this may have been because we connected a 3.3 HDI cluster to the same metastore as our 3.2 on demand cluster. Which is something this document suggests against..
Ensure that the Metastore created for one HDInsight cluster version is not shared across different HDInsight cluster versions. This is due to different Hive versions has different schemas. Example – Hive 1.2 and Hive 2.1 clusters trying to use same Metastore.
This also seems to suggest that there are breaking changes between 3.2 & 3.3 for hive & hcatalog.
https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning