Questions tagged [hivecontext]

Questions related to HiveContext class of Apache Spark

A variant of Spark SQL that integrates with data stored in Hive.

Configuration for Hive is read from hive-site.xml on the classpath. It supports running both SQL and HiveQL commands.

More documentation can be found:

106 questions
1
vote
2 answers

How to create schema for Spark SQL for Array of struct?

How to create a schema for the below json to read schema. I am using hiveContext.read.schema().json("input.json"), and I want to ignore the first two "ErrorMessage" and "IsError" read only Report. Below is the JSON: { "ErrorMessage": null, …
Divya
  • 13
  • 1
  • 4
1
vote
1 answer

Create hive table through spark job

I am trying to create hive tables as outputs of my spark (1.5.1 version) job on a hadoop cluster (BigInsight 4.1 distribution) and am facing permission issues. My guess is spark is using a default user (in this case 'yarn' and not the job…
Hatak
  • 53
  • 1
  • 6
1
vote
0 answers

Join two tables using pyspark hive context

I am seeing below error when joining two hive tables using pyspark hive context . error: """) File "/usr/hdp/2.3.4.7-4/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 552, in sql File …
user8617180
  • 267
  • 6
  • 20
1
vote
2 answers

Spark HiveContext : Insert Overwrite the same table it is read from

I want to apply SCD1 and SCD2 using PySpark in HiveContext. In my approach, I am reading incremental data and target table. After reading, I am joining them for upsert approach. I am doing registerTempTable on all the source dataframes. I am trying…
Manu Gupta
  • 820
  • 6
  • 20
1
vote
4 answers

Filter out null strings and empty strings in hivecontext.sql

I'm using pyspark and hivecontext.sql and I want to filter out all null and empty values from my data. So I used simple sql commands to first filter out the null values, but it doesen't work. My code: hiveContext.sql("select column1 from table where…
Thaise
  • 1,043
  • 3
  • 16
  • 28
1
vote
1 answer

Spark sql read json file from hdfs failed

My code like this: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) import sqlContext.implicits._ val customers = sqlContext.read.json("jsonfilepath") In spark-shell occur error ,I can not understand this: 17/06/19 09:59:04 ERROR…
shaojie
  • 121
  • 1
  • 11
1
vote
0 answers

org.apache.spark.shuffle.FetchFailedException Adjusted frame length exceeds 2147483647

Getting the below exception when I try to convert the dataframes to javaRDD's 17/05/22 09:18:41 WARN TaskSetManager: Lost task 57.0 in stage 29.0 (TID 3985, dayrhectod011.enterprisenet.org): FetchFailed(BlockManagerId(4,…
1
vote
2 answers

how to get an integer value while querying a count query through DataFrame?

I am writing this code to get the integer value of count in specified table: sc = SparkContext("local", "spar") hive_context = HiveContext(sc) hive_context.sql("use zs_trainings_trainings_db") df = hive_context.sql("select count(*) from ldg_sales")
1
vote
2 answers

Spark: Not able to read data from hive tables

I have created a maven project as pom.xml 1.3.0 org.apache.spark spark-core_2.11 ${spark.version}
sachingupta
  • 709
  • 2
  • 9
  • 30
1
vote
0 answers

Running Spark job locally with HiveContext throwing error

I am running spark jobs locally for debugging purpose. I have imported spark-core jar file using sbt. I am using hiveContext in my code. It is throwing following error. The root scratch dir: /tmp/hive on HDFS should be writable. Current…
hp2326
  • 181
  • 1
  • 3
  • 12
1
vote
1 answer

Executing OLAP functions with Spark SQL

I am working with spark version 1.6. I want to execute OLAP functions include CUBE, ROLLUP, GROUPING SETS through sql queries on Spark. I understand that the cube and rollup functions are available on the dataframe api but how can I execute them…
Andy Dufresne
  • 6,022
  • 7
  • 63
  • 113
1
vote
1 answer

SparkJob file name

I'm using a HQL query, that contains something similar to... INSERT OVERWRITE TABLE ex_tb.ex_orc_tb select *, SUBSTR(INPUT__FILE__NAME,60,4), CONCAT_WS('-', SUBSTR(INPUT__FILE__NAME,71,4), SUBSTR(INPUT__FILE__NAME,75,2),…
firestreak
  • 377
  • 4
  • 17
1
vote
1 answer

Issues with reading external hive partitioned table using spark hivecontext

I have a external hive partitioned table which I'm trying to read from Spark using HiveContext. But I'm getting null values. val maxClose = hiveContext.sql("select max(Close) from stock_partitioned_data where symbol = 'AAPL'"); …
Charls Joseph
  • 141
  • 2
  • 9
1
vote
1 answer

HiveException: Failed to create spark client

1)I have created a sql file where we are collecting the data from two different hive table and Inserting into a single Hive table, 2) we are invoking this SQL file using shell script 3)Sample Spark Setting: SET hive.execution.engine=spark; SET…
RITESH KUMAR
  • 21
  • 1
  • 5
1
vote
0 answers

How do I increase the number of partitions when I read in a hive table in Spark

So, I am trying to read in a hive table in Spark with hiveContext. The job basically reads data from two tables into two Dataframes which are subsequently converted to RDD's. I, then, join them based on a common key. However, this join is failing…
MV23
  • 285
  • 5
  • 17