Questions tagged [azure-hdinsight]

Questions about Azure HDInsight, is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the Microsoft Azure cloud.

Azure-HDInsight is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the cloud.

934 questions
6
votes
1 answer

How to use Avro on HDInsight Spark/Jupyter?

I am trying to read in a avro file inside HDInsight Spark/Jupyter cluster but got u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;' Traceback (most recent…
Jiew Meng
  • 84,767
  • 185
  • 495
  • 805
6
votes
4 answers

Submit a Spark job from C# and get results

As per title, I would like to request a calculation to a Spark cluster (local/HDInsight in Azure) and get the results back from a C# application. I acknowledged the existence of Livy which I understand is a REST API application sitting on top of…
Stefano d'Antonio
  • 5,874
  • 3
  • 32
  • 45
6
votes
0 answers

Error creating plugin: org.apache.hadoop.metrics2.sink.WasbAzureIaasSink

I have created HDI (3.6) Spark(2.1.0) cluster in Azure and installed my custom application. When I start my application, I am getting the following error in my custom application log. Error log:- 2017-04-28 07:21:55.926 GMT+0000 WARN [main]…
Galet
  • 5,853
  • 21
  • 82
  • 148
6
votes
3 answers

Remotely execute a Spark job on an HDInsight cluster

I am trying to automatically launch a Spark job on an HDInsight cluster from Microsoft Azure. I am aware that several methods exist to automate Hadoop job submission (provided by Azure itself), but so far I have not been able to found a way to…
Mikel Urkia
  • 2,087
  • 1
  • 23
  • 40
6
votes
1 answer

Opening a port on HDInsight cluster on Azure

I have a microsoft Azure HDInsight cluster. On the node I am rdp'ing and starting an application that binds to port 8080. I would like to be able to connect to this application from outside the cluster. I have my cluster connection string…
mangusbrother
  • 3,988
  • 11
  • 51
  • 103
6
votes
2 answers

HDInsight: HBase or Azure Table Storage?

Currently my team is creating a solution that would use HDInsight. We will be getting 5TB of data daily and will need to do some map/reduce jobs on this data. Would there be any performance/cost difference if our data will be stored in Azure Table…
Victor F
  • 917
  • 16
  • 30
6
votes
3 answers

How to submit Apache Spark job to Hadoop YARN on Azure HDInsight

I am very excited that HDInsight switched to Hadoop version 2, which supports Apache Spark through YARN. Apache Spark is a much better fitting parallel programming paradigm than MapReduce for the task that I want to perform. I was unable to find any…
Niek Tax
  • 841
  • 1
  • 11
  • 30
5
votes
4 answers

Connect to Kafka installed on HDInsight (Azure)

I need to connect from external java application to Kafka cluster that started as part of HDinsight on Azure. I have cluster with 3 instances of brokers, 3 ZooKeepers and one ZooKeeper client. Now my question: how to specify broker connection…
Dewfy
  • 23,277
  • 13
  • 73
  • 121
5
votes
1 answer

reading a csv file from azure blob storage with PySpark

I'm trying to do a machine learning project using a PySpark HDInsight cluster on Microsoft Azure. To operate on my cluster a use a Jupyter notebook. Also, I have my data (a csv file), stored on the Azure Blob storage. According to the documentation…
5
votes
1 answer

How to add external jar to spark in HDInsight?

I am trying to install the Azure CosmosDB Spark connector in an HDInsight Spark Cluster on Azure. (Github) I am new to the spark environment and i couldn't achieve a proper way to add the connector jars to spark config. Methods I used : Method 1 I…
Anis Tissaoui
  • 834
  • 1
  • 7
  • 26
5
votes
3 answers

How to kill spark/yarn job via livy

I am trying to submit spark job via livy using rest api. But if I run same script multiple time it runs multiple instance of a job with different job ID's. I am looking a way to kill spark/yarn job running with same name before starting a new one.…
roy
  • 6,344
  • 24
  • 92
  • 174
5
votes
2 answers

Azure Data Factory can't access HDInsight cluster in IP restricted VNet

I have a HDInsight Hadoop cluster (Linux, deployed separately) on Azure VNet (restricting client IPs using NSG). Azure SQL firewall has an option called "Allow access to Azure services", which allows Data Factory to access Azure SQL. In VNet there…
5
votes
3 answers

How to submit a python wordcount on HDInsight Spark cluster from Jupyter

I am trying to run a python wordcount on a Spark HDInsight cluster and I'm running it from Jupyter. I'm not actually sure if this is the right way to do it, but I couldn't find anything helpful about how to submit a standalone python app on…
5
votes
0 answers

Error when querying Azure storage analytics logs from multiple storage accounts

I have multiple Azure storage accounts and I am trying to use HDInsight to query the storage analytics logs. I want to use a single query across all the storage accounts, so I created an external Hive table and added a partition for each storage…
Mike Goodwin
  • 8,810
  • 2
  • 35
  • 50
5
votes
1 answer

Submit C# MapReduce Job Windows Azure HDInsight - Response status code does not indicate success: 500 (Server Error)

I'm trying to submit a MapReduce job to HDInsight cluster. In my job I didn't write reduce portion because I don't want to reduce anything. All I want to do is to parse the each filename and append the values to every line in the file. So that I…
Can Atuf Kansu
  • 523
  • 2
  • 7
  • 16
1
2
3
62 63