Questions tagged [spark-hive]

Used when using spark-hive module or HiveContext

Apache Spark Hive is a module for for "Hive and structured data processing" on Spark, a fast and general-purpose cluster computing system. It is the super set of Spark SQL and is used to create HiveContext, similar to SqlContext.

76 questions
0
votes
0 answers

spark.sql is not working when code running in aws kubernetes pod

My project code is running in K8 pod and all we did is to upload some data into S3 bucket and create some glue tables in hive to point to those data in s3 bucket. We have spark to run the S3 operation in scala and spark suppose to run spark.sql to…
0
votes
1 answer

What is the difference between using spark Hive and any other Spark with NoSQL or SQL database?

I am new to Spark. I had been trying to use Spark Hive, Spark MySQL or Spark Cassandra. However, i still don't know the differences between them, which is slower, which is more expensive and what are their disadvantages, how they acctually work. Can…
0
votes
1 answer

Overriding Apache Spark dependency (spark-hive)

Tech stack: Spark 2.4.4 Hive 2.3.3 HBase 1.4.8 sbt 1.5.8 What is the best practice for Spark dependency overriding? Suppose that Spark app (CLUSTER MODE) already have spark-hive (2.44) dependency (PROVIDED) I compiled and assembled "custom"…
Code_VM
  • 23
  • 1
  • 4
0
votes
0 answers

Create table in hive through spark

I am trying to connect to Hive through Spark using below code but unable to do so. The code fails with NoSuchDatabaseException Database 'raw' not found. I have database named 'raw' in hive. What am I missing here? val spark = SparkSession …
hampi2017
  • 701
  • 2
  • 13
  • 33
0
votes
1 answer

Spark Java append data to Hive table

I'm facing some problem when trying to append data to an hive table. I declared the session correctly the session: I can retrieve data from the table SparkSession spark = SparkSession .builder() .appName("Java Spark…
0
votes
0 answers

Spark job is failing in oozie with enabling hive support

I am trying to schedule oozie workflow, with spark action and enabled Hive Support. when it was plain spark job without hive support that time actions ran properly. After adding hive support I can run spark job by spark-submit. but when I am trying…
Kalpesh
  • 694
  • 2
  • 8
  • 28
0
votes
0 answers

Databricks - Library Installation Logs

Can you please guide me on where I can find the logs of the Library Installation in Azure Databricks ? I am trying to install the spark-sql_2.11 package from Maven which is failing and there are no details on why it is failing. It would be great if…
0
votes
1 answer

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: https://github.com/qubole/spark-acid Below is our…
0
votes
1 answer

Cannot run simple hql file with pyspark

I am using pyspark==2.4.3 and i just want to run an hql file use myDatabaseName; show tables; and here is what i tried from os.path import expanduser, join, abspath from pyspark.sql import SparkSession from pyspark.sql import Row #…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
0
votes
1 answer

Spark SQL - Runtime Exception while determining schema

I am trying to query a table in remote (on-prem) hive database from my laptop. I am using spark sql. I am able to connect to it and retrieve the latest partition. But however, when i try to retrieve a column (lets say pid), it throws below…
sha256
  • 1
  • 1
  • 3
0
votes
0 answers

How to disable logs on Hive shell when using Spark as the execution engine?

I want to save the result of my hive queries in a file. But the output from hive has a lot of logs as well. Is there any way to disable them. I just want to capture the result of the query. hive> show databases; 2019-03-12 08:49:38 INFO …
wittyameta
  • 375
  • 1
  • 3
  • 16
0
votes
1 answer

HiveOnSpark for Cloudera Manager 5.15 or 6.0?

It seems that HiveOnSpark is not supported in Cloudera Manager. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_hive_on_spark Although I noticed someone saying that Hive version 2.2.0 does support Spark…
Sonic
  • 1
  • 1
0
votes
0 answers

Spark program is not able to connect to MySql hive context through eclipse

I have set the hive metastore in mySql and same can be accessed through hive and create database and tables. If I try to access hive table through spark-shell then able to get the tables info correctly by getting from mysql hive metastore. But it is…
Adithya
  • 1
  • 2
0
votes
2 answers

connecting hive to from spark in intellij

I'm trying to connect to remote hive from within my spark program in Intellij installed on local machine. I placed the hadoop cluster config files on local machine and configured environment variables HADOOP_CONF_DIR in Intellij run configurations…
hitesh sahni
  • 1
  • 1
  • 3
0
votes
2 answers

Unable to execute Hive queries using spark-submit

I am not able run hive queries using spark-submit command. But, the same is getting executed in spark-shell. I am using AWS EMR as the cluster. Below is my code written in eclipse scala IDE object HiveTest { def main(args: Array[String]): Unit…
Vinay Kumar Dudi
  • 137
  • 2
  • 12