Questions tagged [spark-hive]

Used when using spark-hive module or HiveContext

Apache Spark Hive is a module for for "Hive and structured data processing" on Spark, a fast and general-purpose cluster computing system. It is the super set of Spark SQL and is used to create HiveContext, similar to SqlContext.

76 questions
1
vote
1 answer

Attempts to test against a HiveContext when dependencies are provided throws java.lang.SecurityException

When running unit tests that create a spark context I get an java.lang.SecurityException. I understand what the cause is but not sure how to track down how to solve it. This being that multiple dependencies sharing the same package javax.servlet…
Brett Ryan
  • 26,937
  • 30
  • 128
  • 163
1
vote
1 answer

Clear Spark Job history from spark master UI

I'm working on spark, and I want to clear my spark master UI by clearing out all the previous Failed/Finished Jobs. I'm unable to figure out how to do this? I've tried deleting logs from hdfs but the job entries still show up on UI.
Nidhi jain
  • 123
  • 3
  • 14
1
vote
1 answer

How to optimize spark sql operations on large data frame?

I have a large hive table(~9 billion records and ~45GB in orc format). I am using spark sql to do some profiling of the table.But it takes too much time to do any operation on this. Just a count on the input data frame itself takes ~11 minutes to…
1
vote
1 answer

How to create external Hive table without location?

I have a spark sql 2.1.1 job on a yarn cluster in cluster mode where I want to create an empty external hive table (partitions with location will be added in a later step). CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime…
T. Bombeke
  • 107
  • 1
  • 8
1
vote
2 answers

Spark Hive: Filter rows of one DataFrame by the values of another DataFrame's column

I have the following two DataFrames: DataFrame "dfPromotion": date | store =================== 2017-01-01 | 1 2017-01-02 | 1 DataFrame "dfOther": date | store =================== 2017-01-01 | 1 2017-01-03 | 1 Later I…
D. Müller
  • 3,336
  • 4
  • 36
  • 84
1
vote
1 answer

Spark creating array of feild with same key

I have a hive table which is present on top of spark context. The format of the table is as below | key | param1 | Param 2| ------------------------- | A | A11 | A12 | | B | B11 | B12 | | A | A21 | A22 | I wanted to create…
abilng
  • 195
  • 2
  • 10
1
vote
0 answers

Spark SQL dataframe.save with partitionBy is creating an array column

I am trying to save the data of a Spark SQL dataframe to hive. The data that is to be stored should be partitioned by one of the columns in the dataframe. For that I have written the following code. val conf = new SparkConf().setAppName("Hive…
Sai Krishna
  • 624
  • 8
  • 20
1
vote
1 answer

Spark with custom hive bindings

How can I build spark with current (hive 2.1) bindings instead of 1.2? http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support Does not mention how this works. Does spark work well with hive 2.x?
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
1
vote
0 answers

spark hive java.lang.linkageerror

When executing Drop table if exists in Spark HiveContext I'm getting the below error. Hivecontext.sql(Drop table if exists table_name) java.lang.LinkageError: ClassCastException: attempting…
1
vote
2 answers

unable to view data of hive tables after update in spark

Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data var rdd= objHiveContext.sql("select * from HiveTest") rdd.show() --- Able to view data Now I went to my hive shell or ambari…
sudhir
  • 1,387
  • 3
  • 25
  • 43
1
vote
1 answer

apache Spark with hive

How can i read/write data from/to hive? Is it necessary to compile spark with hive profile to interact with hive? which maven dependencies are required to interact with hive? i could not find a well documentation to follow step by step to get…
user3313379
  • 459
  • 10
  • 21
0
votes
0 answers

Spark job throwing exceptions during loading data from Kafka to Hive

We have a big data cluster where we have data in Kakfa topics and we load it to hive using spark job (with java 8).I was using Cloudera 7.1.7 version and spark version (2.4.7.7.1.7.1000-141) SP1 and SP2 and even 7.1.6 version. still getting some…
0
votes
0 answers

I set up remote Hive postgres meta .. but while accessing , i get "DBS" table insert error (DDL is not matching)

Hive 3.1.3, PG 12 - remote meta, changed spark and hive site.xml used schematool to populate default tables USING oracle object storage as hadoop storage. I have replaced actual path with place holder Note: I have hive and spark on same server.…
Harish
  • 969
  • 2
  • 10
  • 15
0
votes
0 answers

Spark Events : Reading Hive table created thru Hive cli vs Hive table created thru Spark

While working on Spark Event listener, am bit confused with the way Spark is behaving. Scenario 1: Hive table created using Hive cli Suppose EMPLOYEE is the hive external/internal table created using hive cli and when we read this table through…
Gurupraveen
  • 181
  • 1
  • 13
0
votes
0 answers

SparkSQL-Why querying over table and view perform drastically different?

I am writing SQL queries over a Spark cluster - 5 workers of (8 cores and 32GB memory). No Hive is associated with it. I found the performance of querying from the table and view methods are very different and want to understand their…
kqboy
  • 1
  • 2