Highest Voted 'azure-hdinsight' Questions

4

votes

2 answers

Error while using the Delta Lake source in Spark 2.4 (Hdinsight)

Getting below error , same code is working in Databricks but not in Hdinsight. I have added the delta library and hadoop-azure library also in the classpath. io.delta:delta-core_2.11:0.5.0,org.apache.hadoop:hadoop-azure:3.1.3 ERROR…

apache-spark azure-hdinsight delta-lake

asked Jul 24 '20 at 21:45

NITIN GUPTA

59
4

4

votes

2 answers

Error through remote Spark Job: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

Problem I am trying to run a remote Spark Job through IntelliJ with a Spark HDInsight cluster (HDI 4.0). In my Spark application I am trying to read an input stream from a folder of parquet files from Azure blob storage using Spark's Structured…

scala apache-spark hadoop spark-structured-streaming azure-hdinsight

asked Jul 13 '20 at 16:18

Maria

121
1
8

4

votes

1 answer

Not able to see 'Lifecycle management' option for ADLS Gen2

I have created ADLS (Azure Data Lake Storage) Gen2 resource (StorageV2 with hierarchical name space enabled). The region I created the resource in is Central US and the performance/access tier is Standard/Hot and replication is LRS. But for this…

azure-storage azure-blob-storage azure-data-lake azure-hdinsight azure-data-lake-gen2

asked Oct 25 '19 at 11:32

Dhiraj

3,396
4
41
80

4

votes

1 answer

Why is an empty file with the name of folder inside a Azure Blob storage container is created?

I am running a Hive QL through HD Insight on-demand cluster which does the following Spool the data from a hive view Create a folder by name abcd inside a Blob storage container named XYZ Store the view data in a file inside the abcd…

azure-blob-storage hiveql azure-hdinsight

asked Oct 18 '18 at 11:10

Surya

51
6

4

votes

1 answer

Power BI & Spark - ODBC: ERROR [HY000] [Microsoft][ThriftExtension] (4)

I am connecting Power BI to Spark but getting this error after attempting connection: Details: "ODBC: ERROR [HY000] [Microsoft][ThriftExtension] (4) Error occurred while contacting server: SSL_read: error code: 0. The connection has been…

apache-spark powerbi thrift azure-hdinsight spark-thriftserver

asked Jan 01 '18 at 14:53

mahendra maid

437
1
6
14

4

votes

2 answers

Read a json file with 12 nested level into hive in AZURE hdinsights

I tried to create a schema for the json file manually and tried to create a Hive table and i am getting column type name length 10888 exceeds max allowed length 2000. I am guessing i have to change the metastore details but i am not sure where is…

json hive apache-spark-sql azure-hdinsight

asked Sep 13 '17 at 08:57

Avinash Nishanth S

514
1
5
15

4

votes

1 answer

Error Code: JA018 whie runnnig oozie workflow in HDInsight spark2 cluster

I am scheduling a oozie job with the following structure in azure hdinsight spark2 cluster. I scheduled the job using the following these following commands, oozie job -config /job.properties -run oozie job -config /coordinator.properties -run But…

azure apache-spark oozie azure-hdinsight oozie-coordinator

asked Jul 14 '17 at 06:14

sathya

1,982
1
20
37

4

votes

2 answers

Configure external jars with HDI Jupyter Spark (Scala) notebook

I have an external custom jar that I would like to use with Azure HDInsight Jupyter notebooks; the Jupyter notebooks in HDI use Spark Magic and Livy. Within the first cell of the notebook, I'm trying to use the jars configuration: %%configure…

apache-spark jupyter-notebook azure-hdinsight livy

asked Mar 04 '17 at 18:08

Denny Lee

3,154
1
20
33

4

votes

1 answer

How to launch Spark 2.0 from HDInsight using Azure Automation

I can't figure out how to launch HDInsight Spark 2.0 from an Azure Automation graphical runbook. I have an existing runbook that works with HDInsight using Spark 1.6. Normally, I would update the version string from 3.4 to 3.5, but it appears that…

azure apache-spark azure-hdinsight azure-automation

asked Nov 08 '16 at 02:56

aaronsteers

2,277
2
21
38

4

votes

2 answers

How to read Azure Table Storage data from Apache Spark running on HDInsight

Is it any way of doing that from a Spark application running on Azure HDInsight? We are using Scala. Azure Blobs are supported (through WASB). I don't understand why Azure Tables aren't. Thanks in advance

azure apache-spark azure-storage azure-hdinsight

asked Aug 14 '15 at 00:21

Jose Parra

877
9
23

4

votes

2 answers

Local emulation for Azure + HDInsight

The task is to implement the T part (transform) of ETL project in Azure cloud. I believe HDInsight is the right service to use for it, but not sure. Please approve or disprove this choice. I am quite new to the field and would appreciate if someone…

c# azure etl azure-hdinsight

asked Aug 22 '13 at 18:32

Paul

1,879
1
23
44

3

votes

1 answer

Repartition in Hadoop

My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we want to modifity to month partitions instead, i…

hadoop hive azure-hdinsight hive-partitions hiveddl

asked Aug 11 '21 at 10:16

frammnm

537
1
5
17

3

votes

0 answers

Spark Kafka - Cannot fetch record for offset in 120000 milliseconds

I'm using Spark to read from Kafka topic. This is my code: val df = spark.readStream.format("kafka"). option("kafka.bootstrap.servers", "myendpoint.servicebus.windows.net:9093"). option("kafka.security.protocol", "SASL_SSL"). …

scala apache-spark apache-kafka azure-eventhub azure-hdinsight

asked Nov 08 '20 at 08:27

xzk

827
2
18
43

3

votes

2 answers

Optimize Hive Query. java.lang.OutOfMemoryError: Java heap space/GC overhead limit exceeded

How can I optimize a query of this form since I keep running into this OOM error? Or come up with a better execution plan? If I removed the substring clause, the query would work fine, suggesting that this takes a lot of memory. When the job fails,…

sql hive out-of-memory azure-hdinsight beeline

asked Jul 08 '20 at 22:13

user7644509

130
9

3

votes

1 answer

Spark submit failing in yarn cluster mode when specifying --files in an Azure HDIinsight cluster

Spark submit in yarn cluster mode failing but its successful in client mode Spark submit: spark-submit --master yarn --deploy-mode cluster \ --py-files packages.zip,deps2.zip \ --files…

apache-spark pyspark azure-hdinsight

asked Jan 29 '20 at 18:09

Sanjeev Roy

51
2

Questions tagged [azure-hdinsight]