3

We have an Azure Data Factory with around 40 pipelines in it, each pipeline has an activity to run a hive script using a HDInsight on demand linked service.

We've since added a custom metastore which uses an azure SQL instance as a storage linked service. The only thing we have set to achieve this is HcatalogLinkedServiceName in a data factory ARM script.

Since doing this we have noticed a rise in activity failures and our performance has degraded (this may have been due to the Azure SQL tier we were using previously which was spiking to 100% DTU usage.)

The activities are failing with Error in Activity: The request was aborted: The request was canceled..

This seems to be fairly consistent when using a custom metastore and doesn't seem to be an issue when using the default.

Is there anything further required to configure a custom metastore which we may have missed, or anything about our configuration/usage described here which could indicate why we are experiencing a significant rise in failures with the above error.

HDI OnDemand OS is linux, version is 3.2.

We also had some more severe errors, but I think this may have been because we connected a 3.3 HDI cluster to the same metastore as our 3.2 on demand cluster. Which is something this document suggests against..

https://blogs.msdn.microsoft.com/azuredatalake/2017/03/24/hive-metastore-in-hdinsight-tips-tricks-best-practices/

Ensure that the Metastore created for one HDInsight cluster version is not shared across different HDInsight cluster versions. This is due to different Hive versions has different schemas. Example – Hive 1.2 and Hive 2.1 clusters trying to use same Metastore.

This also seems to suggest that there are breaking changes between 3.2 & 3.3 for hive & hcatalog.

https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

Kranthi Pakala
  • 1,288
  • 5
  • 10
Nosmadas
  • 632
  • 3
  • 9
  • 17

0 Answers0