Questions tagged [spark-hive]

Used when using spark-hive module or HiveContext

Apache Spark Hive is a module for for "Hive and structured data processing" on Spark, a fast and general-purpose cluster computing system. It is the super set of Spark SQL and is used to create HiveContext, similar to SqlContext.

76 questions
3
votes
2 answers

Running Hive Query in Spark through Oozie 4.1.0.3

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action. Copied hive-site.xml and hive-default.xml from hdfs path workflow.xml used:
2
votes
1 answer

Dbeaver Exception: Data Source was invalid

I am trying to work with Dbeaver and processing data via Spark Hive. The connection is stable as the following command works: select * from database.table limit 100 However, as soon as I differ from the simple fetching query I get an exception.…
Lazloo Xp
  • 858
  • 1
  • 11
  • 36
2
votes
1 answer

Spark Streaming + Hive

We are in a process to build a application that takes data from source system through flume and then with the help of Kafka message system to spark streaming for in memory processing, after processing data into data frame we will put data into hive…
Owais Ajaz
  • 244
  • 5
  • 20
2
votes
1 answer

Errors during maven install when adding spark-hive_2.10 dependency in maven

I am using Scala IDE 4.6.0 and created a maven project using an archetype I got from the book: Spark In Action. I have to use Scala 2.10.4 and Spark 1.6.2. I created a basic project using this archetype and added the spark-hive dependency to the…
jam_ab
  • 71
  • 1
  • 3
2
votes
1 answer

using HiveContext in spark sql throws an exception

I have to use HiveContext instead of SQLContext because of using some window functions that are available only through HiveContext. I have added the following lines to my pom.xml: org.apache.spark
A.B.
  • 51
  • 2
  • 10
1
vote
0 answers

Does REFRESH TABLE update the cache entries of all tables?

I am looking for an approach to update the all the table metadata cache entry just before the write the operation. I have found the way via spark.catalog.refreshTable(table), however I am not sure whether it will update all the tables metadata store…
izhad
  • 19
  • 3
1
vote
0 answers

pyspark Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

New to spark and tried other solutions from stackoverflow but no luck I have installed spark 3.1.2 and did few configuration setup (user spark/conf/spark-defaults.conf) to point aws rds mysql as a metastore (remote) spark.jars.packages…
user1531248
  • 521
  • 1
  • 5
  • 17
1
vote
1 answer

Exception in Connecting Dataproc Hive Server using Java and Spark Eclipse

I am trying to access the Hive server present in GCP - Dataproc from my local machine(eclipse) using java and spark. But I am getting the below error while starting the application. I tried to find the problem but unable to solve it. Exception in…
1
vote
1 answer

Spark saveAsTable with location at s3 bucket's root cause NullPointerException

I am working with Spark 3.0.1 and my partitioned table is stored in s3. Please find here the description of the issue. Create Table Create table root_table_test_spark_3_0_1 ( id string, name string ) USING PARQUET PARTITIONED BY…
Michael
  • 33
  • 5
1
vote
2 answers

Result-set inconsistency between hive and hive-llap

we are using Hive 3.1.x clusters on HDI 4.0, with 1 being LLAP and another Just HIVE. we've created a managed tables on both the clusters with the row count being 272409. Before merge on both…
Vinay K L
  • 45
  • 1
  • 10
1
vote
0 answers

Spark stand-alone v 2.3.2 Failing test

I have build spark v 2.3.2 on big endian platform using adopt jdk 1.8 build is successful and we encounter test case failures in the following module. I wanted some information related to this failing test, information on how severely would this…
1
vote
1 answer

External table is empty when ORC data is saved

I want to write ORC data into an external Hive table from the Spark data frame. When I save the data frame as a table the data is sent to existing external table, however, when I try to save the data in ORC format into the directory and then read…
Cassie
  • 2,941
  • 8
  • 44
  • 92
1
vote
0 answers

Not able to read Hive table using sparkR submit

Here is my Code: sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver.memory="8g")) hiveContext <- sparkRHive.init(sc) sqlQuery <- "SELECT * from table ABC" joinSQL <- sql(hiveContext,sqlQuery) This is giving error…
Manoj
  • 11
  • 3
1
vote
0 answers

How to read snappy compressed sequence File in spark

We have our huge legacy files sitting in our hadoop cluster in compressed sequence file Format. The sequence files were created using hive ETL. Lets say I had table in hive created using the following DDL: CREATE TABLE sequence_table( col1…
1
vote
1 answer

How can I use an SQL subquery within Spark 1.6

How can I convert the following query to be compatible with Spark 1.6 which does not supported subqueries: SELECT ne.device_id, sp.device_hostname FROM `table1` ne INNER JOIN `table2` sp ON sp.device_hostname = (SELECT…
user6666914
  • 31
  • 1
  • 6