Questions tagged [azure-hdinsight]

Questions about Azure HDInsight, is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the Microsoft Azure cloud.

Azure-HDInsight is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the cloud.

934 questions
4
votes
2 answers

Error while using the Delta Lake source in Spark 2.4 (Hdinsight)

Getting below error , same code is working in Databricks but not in Hdinsight. I have added the delta library and hadoop-azure library also in the classpath. io.delta:delta-core_2.11:0.5.0,org.apache.hadoop:hadoop-azure:3.1.3 ERROR…
4
votes
2 answers

Error through remote Spark Job: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

Problem I am trying to run a remote Spark Job through IntelliJ with a Spark HDInsight cluster (HDI 4.0). In my Spark application I am trying to read an input stream from a folder of parquet files from Azure blob storage using Spark's Structured…
4
votes
1 answer

Not able to see 'Lifecycle management' option for ADLS Gen2

I have created ADLS (Azure Data Lake Storage) Gen2 resource (StorageV2 with hierarchical name space enabled). The region I created the resource in is Central US and the performance/access tier is Standard/Hot and replication is LRS. But for this…
4
votes
1 answer

Why is an empty file with the name of folder inside a Azure Blob storage container is created?

I am running a Hive QL through HD Insight on-demand cluster which does the following Spool the data from a hive view Create a folder by name abcd inside a Blob storage container named XYZ Store the view data in a file inside the abcd…
Surya
  • 51
  • 6
4
votes
1 answer

Power BI & Spark - ODBC: ERROR [HY000] [Microsoft][ThriftExtension] (4)

I am connecting Power BI to Spark but getting this error after attempting connection: Details: "ODBC: ERROR [HY000] [Microsoft][ThriftExtension] (4) Error occurred while contacting server: SSL_read: error code: 0. The connection has been…
4
votes
2 answers

Read a json file with 12 nested level into hive in AZURE hdinsights

I tried to create a schema for the json file manually and tried to create a Hive table and i am getting column type name length 10888 exceeds max allowed length 2000. I am guessing i have to change the metastore details but i am not sure where is…
Avinash Nishanth S
  • 514
  • 1
  • 5
  • 15
4
votes
1 answer

Error Code: JA018 whie runnnig oozie workflow in HDInsight spark2 cluster

I am scheduling a oozie job with the following structure in azure hdinsight spark2 cluster. I scheduled the job using the following these following commands, oozie job -config /job.properties -run oozie job -config /coordinator.properties -run But…
sathya
  • 1,982
  • 1
  • 20
  • 37
4
votes
2 answers

Configure external jars with HDI Jupyter Spark (Scala) notebook

I have an external custom jar that I would like to use with Azure HDInsight Jupyter notebooks; the Jupyter notebooks in HDI use Spark Magic and Livy. Within the first cell of the notebook, I'm trying to use the jars configuration: %%configure…
Denny Lee
  • 3,154
  • 1
  • 20
  • 33
4
votes
1 answer

How to launch Spark 2.0 from HDInsight using Azure Automation

I can't figure out how to launch HDInsight Spark 2.0 from an Azure Automation graphical runbook. I have an existing runbook that works with HDInsight using Spark 1.6. Normally, I would update the version string from 3.4 to 3.5, but it appears that…
aaronsteers
  • 2,277
  • 2
  • 21
  • 38
4
votes
2 answers

How to read Azure Table Storage data from Apache Spark running on HDInsight

Is it any way of doing that from a Spark application running on Azure HDInsight? We are using Scala. Azure Blobs are supported (through WASB). I don't understand why Azure Tables aren't. Thanks in advance
Jose Parra
  • 877
  • 9
  • 23
4
votes
2 answers

Local emulation for Azure + HDInsight

The task is to implement the T part (transform) of ETL project in Azure cloud. I believe HDInsight is the right service to use for it, but not sure. Please approve or disprove this choice. I am quite new to the field and would appreciate if someone…
Paul
  • 1,879
  • 1
  • 23
  • 44
3
votes
1 answer

Repartition in Hadoop

My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we want to modifity to month partitions instead, i…
frammnm
  • 537
  • 1
  • 5
  • 17
3
votes
0 answers

Spark Kafka - Cannot fetch record for offset in 120000 milliseconds

I'm using Spark to read from Kafka topic. This is my code: val df = spark.readStream.format("kafka"). option("kafka.bootstrap.servers", "myendpoint.servicebus.windows.net:9093"). option("kafka.security.protocol", "SASL_SSL"). …
xzk
  • 827
  • 2
  • 18
  • 43
3
votes
2 answers

Optimize Hive Query. java.lang.OutOfMemoryError: Java heap space/GC overhead limit exceeded

How can I optimize a query of this form since I keep running into this OOM error? Or come up with a better execution plan? If I removed the substring clause, the query would work fine, suggesting that this takes a lot of memory. When the job fails,…
user7644509
  • 130
  • 9
3
votes
1 answer

Spark submit failing in yarn cluster mode when specifying --files in an Azure HDIinsight cluster

Spark submit in yarn cluster mode failing but its successful in client mode Spark submit: spark-submit --master yarn --deploy-mode cluster \ --py-files packages.zip,deps2.zip \ --files…
1 2
3
62 63