Questions tagged [hivecontext]

Questions related to HiveContext class of Apache Spark

A variant of Spark SQL that integrates with data stored in Hive.

Configuration for Hive is read from hive-site.xml on the classpath. It supports running both SQL and HiveQL commands.

More documentation can be found:

106 questions
0
votes
2 answers

How to get HiveContext from JavaSparkContext

In some Spark codes, I have seen that programmers use such code to create SparkContext SparkSession session = SparkSession .builder() .appName("Spark Hive Example") .config("spark.sql.warehouse.dir", warehouseLocation) …
Vinay Limbare
  • 151
  • 2
  • 16
0
votes
1 answer

Unable to write data on hive using spark

I am using spark1.6. I am creating hivecontext using spark context. When I save the data into hive it gives error. I am using cloudera vm. My hive is inside cloudera vm and spark in on my system. I can access the vm using IP. I have started the…
lucy
  • 4,136
  • 5
  • 30
  • 47
0
votes
2 answers

Hive Merge command is not working in Spark HiveContext

I am running hive merge command using Spark HiveContext in 1.6.3 spark version, but it is failing with below error. 2017-09-11 18:30:33 Driver [INFO ] ParseDriver - Parse Completed 2017-09-11 18:30:34 Driver [INFO ] ParseDriver - Parsing command:…
Hokam
  • 924
  • 7
  • 19
0
votes
1 answer

Dataframe in pypark - How to apply aggregate functions to into two columns?

I'm using Dataframe in pyspark. I have one table like Table 1 bellow. I need to obtain Table 2. Where: num_category - it is how many differents categories for each id sum(count) - it is the sum of the third column in Table 1 for each id.…
Thaise
  • 1,043
  • 3
  • 16
  • 28
0
votes
0 answers

Task of very large size in Spark

I have a program which convert an excel file to a Spark DataFrame and then write this file on our datalake in a compressed ORC format. Note that I am constrained in using Spark 1.6.2 API. Variable sq is a HiveContext Variable schema contains a…
sweeeeeet
  • 1,769
  • 4
  • 26
  • 50
0
votes
1 answer

LAG function under HIVE CONTEXT is throwing java.lang.NullPointerException

The script below (Spark 1.6) aborts with java.lang.NullPointerException, primarily due to the function LAG. Please advise. from pyspark.sql import HiveContext sqlc= HiveContext(sc) rdd = sc.parallelize([(1, 65), (2, 66), (3, 65), (4, 68), (5,…
0
votes
1 answer

Issue in inserting data to Hive Table using Spark and Scala

I am new to Spark. Here is something I wanna do. I have created two data streams; first one reads data from text file and register it as a temptable using hivecontext. The other one continuously gets RDDs from Kafka and for each RDD, it it creates…
omer
  • 187
  • 6
  • 16
0
votes
0 answers

HiveContext and SQLContext in Local Mode

I am developing Spark jobs in My local machine and later deploying on the cluster for full run. I have created a common library that other people uses in their code. In this code, I have to use HiveContext to do Spark SQL which many people suggested…
hp2326
  • 181
  • 1
  • 3
  • 12
0
votes
1 answer

Spark job that use hive context failing in oozie

In one of our pipelines we are doing aggregation using spark(java) and it is orchestrated using oozie. This pipelines writes the aggregated data to an ORC file using the following lines. HiveContext hc = new HiveContext(sc); DataFrame modifiedFrame…
sudharshan r
  • 45
  • 1
  • 7
0
votes
1 answer

Spark SQL partition pruning for a cached table

Is partition pruning enabled for cached TempTables in apache spark? If so, how do I configure it? My data is a bunch of sensor readings in different installations, one row contains installationName, tag, timestamp and value. I have written the data…
0
votes
1 answer

object HiveContext in package hive cannot be accessed in package

HI Coders, I'm back again. I'm trying to create a hive table from a dataframe using HIve context in my scala code, im able to do it in sqlContext but when it comes to HiveContext, it is throwing this error [error]…
jack AKA karthik
  • 885
  • 3
  • 15
  • 30
0
votes
2 answers

Running the hive queries from a file through code via sparkcontext or hivecontext (not through command line)

Consider there are few hive queries in a file, my moto is to run the file using hivecontext or sparkcontext Using command line I can do that by hive -f 'filepath/filename' But I have to run it via code (hivecontext or sparkcontext) Can anybody help…
Siva kumar
  • 11
  • 4
0
votes
2 answers

java.lang.NoSuchMethodError: org.apache.spark.sql.hive.HiveContext.sql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame

I'm getting below error while running spark program using spark-submit. My spark-cluster is of version 2.0.0 and I use sbt to compile my code and below is my sbt dependencies. libraryDependencies ++= Seq( "commons-io" % "commons-io" % "2.4", …
Charls Joseph
  • 141
  • 2
  • 9
0
votes
0 answers

comparing dataframes to import incremental data in spark and scala issue

I have derived a dataframe from oracle using Sqlcontext and I have registered it as temp table tb1. I have another dataframe which is derived from hive using HiveContext and I registered this as table tb2. When I am trying to access these two tables…
roh
  • 1,033
  • 1
  • 11
  • 19
0
votes
0 answers

connect to HiveMetaStore from HiveContext

I'm doing some tests on tables I created via HiveContext.sql(). Are there anyways I can connect to underlying HiveMetaStore using org.apache.hadoop.hive.metastore.HiveMetaStoreClient? I tried to init HiveMetaStoreClient(hiveContext.hiveconf()) but I…