Questions tagged [hivecontext]

Questions related to HiveContext class of Apache Spark

A variant of Spark SQL that integrates with data stored in Hive.

Configuration for Hive is read from hive-site.xml on the classpath. It supports running both SQL and HiveQL commands.

More documentation can be found:

106 questions
2
votes
3 answers

Unable to get existing Hive tables from HiveContext using Spark

I am trying to get databases or table details of Hive from spark using HiveContext. But I am unable to point to existing Hive Database as shown below: Spark Version: 2.2.0 Hive Version : 2.3.0 Using below script in Spark Shell to connect to existing…
Ku002
  • 117
  • 1
  • 2
  • 14
2
votes
1 answer

Spark protobuf message processing error "java.lang.RuntimeException: Unable to find proto buffer class"

Getting below error while processing protobuf bytearray message in java spark. ThrowableSerializationWrapper: Task exception could not be deserialized java.lang.RuntimeException: Unable to find proto buffer class SparkConf sparkConf = new…
Pravin Bange
  • 399
  • 5
  • 16
2
votes
1 answer

Spark HiveContext : Spark Engine OR Hive Engine?

I am trying to understand spark hiveContext. when we write query using hiveContext like sqlContext=new HiveContext(sc) sqlContext.sql("select * from TableA inner join TableB on ( a=b) ") Is it using Spark Engine OR Hive Engine?? I believe above…
Rohan Nayak
  • 233
  • 4
  • 14
2
votes
1 answer

HiveContext - unable to access hbase table mapped in hive as external table

I am trying to access the hbase table mapped in hive using HiveContext in Spark. But I am getting ClassNotFoundException Exceptions.. Below is my code. import org.apache.spark.sql.hive.HiveContext val sqlContext = new HiveContext(sc) val df =…
user2731629
  • 402
  • 1
  • 7
  • 17
2
votes
0 answers

Error observed when trying to create hive table from a Spark Data Frame

CREATED HIVE CONTEXT AND THEN TRYING TO CREATE TABLE USING A VIEW Final_Data is a data frame. val sqlCtx= new HiveContext(sc) Final_Data.createOrReplaceTempView("Final_Prediction") sqlCtx.sql("create table results as select * from…
Varun
  • 83
  • 1
  • 10
2
votes
1 answer

using HiveContext in spark sql throws an exception

I have to use HiveContext instead of SQLContext because of using some window functions that are available only through HiveContext. I have added the following lines to my pom.xml: org.apache.spark
A.B.
  • 51
  • 2
  • 10
2
votes
1 answer

Having count(distinct) not working with hivecontext query in spark 1.6

Recently we had an spark update from 1.3 to 1.6 version and after this update the queries with "having count(distinct)" conditions are not working, We get the below error query :: hiveContext.sql( "select A1.x, A1.y, A1.z from (select concat(g,h)…
Yash_spark
  • 25
  • 2
2
votes
1 answer

How to divide a numerical columns in ranges and assign labels for each range in apache spark?

I have the following sparkdataframe: id weekly_sale 1 40000 2 120000 3 135000 4 211000 5 215000 6 331000 7 337000 I need to see in which of the following intervals items in weekly_sale column fall: under 100000 between 100000…
chessosapiens
  • 3,159
  • 10
  • 36
  • 58
2
votes
1 answer

Spark HiveContext - reading from external partitioned Hive table delimiter issue

I have an external partitioned Hive table with underling file ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' Reading data via Hive directly is just fine, but when using Spark's Dataframe API the delimiter '|' is not taken into consideration. Create…
ValeryC
  • 477
  • 3
  • 17
2
votes
0 answers

How to use a specific directory for metastore with HiveContext?

So this is what I tried in Spark Shell. scala> import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.HiveContext scala> import java.nio.file.Files import java.nio.file.Files scala> val hiveDir =…
aa8y
  • 3,854
  • 4
  • 37
  • 62
1
vote
1 answer

Where is the hive-site.xml in Cloudera distribution?

I would like to know where the hive-site.xml file configuration is in a Cloudera distribution. Mainly because I would like to know where I can find out properties…
1
vote
1 answer

aws glue HiveContext access glue DataCatalog

I can read a table, defined in the glue data catalogue from a glue job with the glueContext. However, if I want to read the exact same table with hiveContext, I receive an error message stating that it cannot find that table. In my opinion the…
C.Tomas
  • 451
  • 2
  • 7
  • 15
1
vote
0 answers

Spark job creating only 1 stage task when executed

I am trying to load data from DB2 to Hive using Spark 2.1.1. & Scala 2.11. Code used is given below import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql import org.apache.spark.sql.SparkSession import…
1
vote
1 answer

column is not a member of org.apache.spark.sql.DataFrame

I am new to spark and I am trying to join two tables present in hive from Scala code: import org.apache.spark.sql._ import sqlContext.implicits._ val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val csp = hiveContext.sql("select *…
ashwini
  • 531
  • 5
  • 13
  • 28
1
vote
1 answer

How to prevent memory leak when testing with HiveContext in PySpark

I use pyspark to do some data processing and leverage HiveContext for the window function. In order to test the code, I use TestHiveContext, basically copying the implementation from pyspark source…
matt hoover
  • 326
  • 2
  • 7
  • 23