Questions tagged [hivecontext]

Questions related to HiveContext class of Apache Spark

A variant of Spark SQL that integrates with data stored in Hive.

Configuration for Hive is read from hive-site.xml on the classpath. It supports running both SQL and HiveQL commands.

More documentation can be found:

106 questions
1
vote
1 answer

Append transformed columns to spark dataframe using scala

I am trying to access a hive table and extract and transform certain columns from the table/dataframe and then put those new columns in a new dataframe. I am trying to do it in this way - val sqlContext = new…
preitam ojha
  • 239
  • 1
  • 2
  • 7
1
vote
0 answers

Spark cache behaviour when the source table is modified

I have a hive table("person"), which is cached in Spark. sqlContext.sql("create table person (name string, age int)") //Create a new table //Add some values to the table ... ... //Cache the table in Spark sqlContext.cacheTable("person")…
outlier229
  • 481
  • 1
  • 7
  • 18
1
vote
0 answers

How to use one RDD's result to filter other RDD records?

I want to filter records from target table whose date is greater then min(date) of source table (with common id in both table) val cm_record_rdd=hiveContext.sql("select t1.* from target t1 left outer join source t2 on t1.id=t2.id") val…
user2895589
  • 1,010
  • 4
  • 20
  • 33
1
vote
1 answer

Spark SQL(Hive query through HiveContext) always creating 31 partitions

I am running hive queries using HiveContext from my Spark code. No matter which query I run and how much data it is, it always generates 31 partitions. Anybody knows the reason? Is there a predefined/configurable setting for it? I essentially need…
Nitin
  • 103
  • 1
  • 1
  • 6
1
vote
1 answer

Select rows except the one that contains min value in Spark using HiveContext

I have a Spark Data Frame that contains Timestamp and Machine Ids. I wish to remove the lowest timestamp value from each group. I tried following code: sqlC <- sparkRHive.init(sc) ts_df2<- sql(sqlC,"SELECT ts,Machine FROM sdf2 EXCEPT SELECT…
ps30
  • 13
  • 5
1
vote
1 answer

Using Hive functions in Spark Job via hiveContext

I am using Hive 1.2 and Spark 1.4.1. The Following query runs perfectly fine via Hive CLI: hive> select row_number() over (partition by one.id order by two.id) as sk, two.id, two.name, one.name, current_date() from avant_source.one one inner join…
manmeet
  • 330
  • 2
  • 4
  • 15
1
vote
0 answers

Unable to connect to hive from scala ide using Spark

Here is my code and pom.xml and error can any one figure what is the exact reason . Code: def main(args:Array[String]){ val objConf = new SparkConf().setAppName("Spark Connection").setMaster("spark://10.40.10.80:7077") var sc = new…
sudhir
  • 1,387
  • 3
  • 25
  • 43
1
vote
0 answers

How to solve hiveContext in spark local model throw a java oom permGen space error

When I create a hiveContext in spark local model using IDEA,which the spark version is 1.6.0,the program throw a exception.The exception as follows: Caused by: java.lang.OutOfMemoryError: PermGen space at java.lang.ClassLoader.defineClass1(Native…
hujun
  • 21
  • 4
1
vote
2 answers

unable to view data of hive tables after update in spark

Case: I have a table HiveTest which is a ORC table and transaction set true and loaded in spark shell and viewed data var rdd= objHiveContext.sql("select * from HiveTest") rdd.show() --- Able to view data Now I went to my hive shell or ambari…
sudhir
  • 1,387
  • 3
  • 25
  • 43
1
vote
0 answers

Update first table column with another table's values in Spark SQL using HiveContext

I want to update a column in my existing table by overwriting that column from another table. Example: There is the column name in table student, but there is another table employee and I want to overwrite column name of the student table by column…
Swapnil Dixit
  • 101
  • 1
  • 8
1
vote
1 answer

HiveContext in Bluemix Spark

In bluemix spark I want to use HiveContext HqlContext = HiveContext(sc) //some code df = HqlContext.read.parquet("swift://notebook.spark/file.parquet") I get following error Py4JJavaError: An error occurred while calling o45.parquet. : …
YAKOVM
  • 9,805
  • 31
  • 116
  • 217
0
votes
1 answer

Ordering of a string column that contains numbers in it using hive context

I have a column called priority among other columns in a file and contains numbers For ex: 1, 2, 3, 4, 5, 6 etc. The file data is as follows Department Strength Priority -------------------------------- CS Good 10 CS Low …
TomG
  • 281
  • 1
  • 2
  • 20
0
votes
2 answers

Spark HiveContext apply IN operation using sql method

I have an employee hive table with column Name , Department , City and i want to retrieve the data based on names of the employee using IN operation in HiveContext.sql() function but it is throwing pyspark.Analysis Exception.please look at example…
Ramesh Raj
  • 185
  • 1
  • 2
  • 13
0
votes
1 answer

ImportError: cannot import name 'HiveContext' from 'pyspark.sql'

I am running pyspark in my PC (windows 10) but I can not import HiveContext: from pyspark.sql import HiveContext --------------------------------------------------------------------------- ImportError Traceback (most…
user8270077
  • 4,621
  • 17
  • 75
  • 140
0
votes
1 answer

How to list All Databases using HiveContext in PySpark 1.6

I am trying to list all the databases using HiveContext in Spark 1.6 but its giving me just the default database. from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext.getOrCreate() from pyspark.sql import…