Highest Voted 'apache-spark-1.6' Questions

2

votes

1 answer

pyspark memory issue :Caused by: java.lang.OutOfMemoryError: Java heap space

Folks, Am running a pyspark code to read 500mb file from hdfs and constructing a numpy matrix from the content of the file Cluster Info: 9 datanodes 128 GB Memory /48 vCore CPU /Node Job config conf = SparkConf().setAppName('test') \ …

pyspark out-of-memory apache-spark-1.6

asked May 17 '18 at 23:17

Suresh Sethuramaswamy

31
7

2

votes

0 answers

Spark temp tables not found

I'm trying to run a pySpark job with custom inputs, for testing purposes. The job has three sets of input, each read from a table in a different metastore database. The data is read in spark with: hiveContext.table('myDb.myTable') The test inputs…

apache-spark pyspark apache-spark-1.6 hive-metastore

asked Mar 08 '18 at 21:29

summerbulb

5,709
8
37
83

2

votes

3 answers

Pyspark: How to return a tuple list of existing non null columns as one of the column values in dataframe

i'm working with a pyspark dataframe which is: +----+----+---+---+---+----+ | a| b| c| d| e| f| +----+----+---+---+---+----+ | 2|12.3| 5|5.6| 6|44.7| |null|null| 9|9.3| 19|23.5| | 8| 4.3| 7|0.5| 21| 8.2| | 9| 3.8| 3|6.5| 45|…

apache-spark pyspark apache-spark-sql apache-spark-1.6

asked Feb 19 '18 at 05:30

Mia21

119
2
10

2

votes

1 answer

Exception in thread "main" java.lang.NoClassDefFoundError: org/ejml/simple/SimpleBase

It seems it's missing the Java library Efficient Java Matrix Library(ejml), so I have downloaded from the sources here. I'm creating Maven Jar executable file and running on Openstack EDP Spark environment. I'm having trouble figuring out how to…

java maven apache-spark-1.6 ejml

asked Sep 04 '17 at 12:28

Dheeraj Chitara

31
1
5

2

votes

1 answer

Why does importing SparkSession in spark-shell fail with "object SparkSession is not a member of package org.apache.spark.sql"?

I use Spark 1.6.0 on my VM, Cloudera machine. I'm trying to enter some data into Hive table from Spark shell. To do that, I am trying to use SparkSession. But the below import is not working. scala> import…

apache-spark cloudera-cdh apache-spark-1.6

asked Jun 27 '17 at 04:44

Metadata

2,127
9
56
127

2

votes

1 answer

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SQLContext

I am using IntelliJ 2016.3 version. import sbt.Keys._ import sbt._ object ApplicationBuild extends Build { object Versions { val spark = "1.6.3" } val projectName = "example-spark" val common = Seq( version := "1.0", …

apache-spark apache-spark-sql noclassdeffounderror apache-spark-1.6

asked Jun 20 '17 at 06:10

Mahesh

178
3
14

2

votes

2 answers

How to use different Hive metastore for saveAsTable?

I am using Spark SQL (Spark 1.6.1) using PySpark and I have a requirement of loading a table from one Hive metastore and writing the result of the dataframe into a different Hive metastore. I am wondering how can I use two different metastores for…

apache-spark hive pyspark apache-spark-sql apache-spark-1.6

asked May 19 '17 at 15:57

Srinivas Bandaru

311
1
4
16

2

votes

2 answers

How to read a space-delimited text file and save it to Hive?

I have a string like below. The first row is the header, and the rest are the column values. I want to create a dataframe (Spark 1.6 and Java7) from the String , and convert the values under col3 and col4 as DOUBLE . col1 col2 col3 col4 col5 val1…

java apache-spark apache-spark-sql apache-spark-1.6

asked May 19 '17 at 05:34

John Thomas

212
3
21

2

votes

2 answers

How to do GROUP BY on exploded field in Spark SQL's?

Zeppelin 0.6 Spark 1.6 SQL I am trying to find the top 20 occurring words in some tweets. filtered contains an array of words for each tweet. The following: select explode(filtered) AS words from tweettable lists each word as you would expect,…

sql apache-spark apache-spark-sql apache-zeppelin apache-spark-1.6

asked Apr 16 '17 at 08:07

schoon

2,858
3
46
78

1

vote

1 answer

Convert a (String, List[(String, String)]) to JSON object

I have the data as: (ID001,List((BookType,[text]),(author,xyz abc),(time,01/12/2019[22:00] CST/PM))),(ID002,List((BookType,[text]),(author,klj fgh),(time,19/02/2019[12:00] CST/AM))) I need to convert this to a JSON object: {"ID001":{ …

scala apache-spark-1.6

asked Aug 19 '20 at 12:07

chris

43
2

1

vote

1 answer

How to display mismatched report with a label in spark 1.6 - scala except function?

Consider there are 2 dataframes df1 and df2. df1 has below data A | B ------- 1 | m 2 | n 3 | o df2 has below data A | B ------- 1 | m 2 | n 3 | p df1.except(df2) returns A | B ------- 3 | o 3 | p How to display the result as…

scala dataframe apache-spark apache-spark-1.6

asked Feb 27 '20 at 13:26

voidpro

1,652
13
27

1

vote

2 answers

Repartition() causes spark job to fail

I have a spark job that runs file with the below code. However this step create several files in the output folder. sampledataframe.write.mode('append').partitionBy('DATE_FIELD').save(FILEPATH) So I have started to use the below line of code to…

python apache-spark pyspark apache-spark-1.6

asked Sep 26 '19 at 21:51

Bob

335
1
4
16

1

vote

1 answer

Pyspark - DataFrame persist() errors out java.lang.OutOfMemoryError: GC overhead limit exceeded

Pyspark job fails when I try to persist a DataFrame that was created on a table of size ~270GB with error Exception in thread "yarn-scheduler-ask-am-thread-pool-9" java.lang.OutOfMemoryError: GC overhead limit exceeded This issue happens only…

apache-spark pyspark mapr apache-spark-1.6

asked Feb 14 '19 at 05:43

Sam

17
5

1

vote

0 answers

Spark 1.6 - Overwrite directory with avro files failing using dataframes

I have a directory in HDFS which contains avro files. While I try to overwrite the directory with dataframe it fails. Syntax: avroData_df.write.mode(SaveMode.Overwrite).format("com.databricks.spark.avro").save("") The error is: Caused by:…

apache-spark-sql spark-avro apache-spark-1.6

asked Jul 19 '18 at 09:47

Mnav505

13
3

1

vote

1 answer

Spark Streaming 1.6 + Kafka: Too many batches in "queued" status

I'm using spark streaming to consume messages from a Kafka topic, which has 10 partitions. I'm using direct approach to consume from kafka and the code can be found below: def createStreamingContext(conf: Conf): StreamingContext = { val…

scala apache-kafka spark-streaming apache-spark-1.6

asked Jul 18 '18 at 15:17

Jorge Cespedes

547
1
11
21

Questions tagged [apache-spark-1.6]