Highest Voted 'apache-spark-2.0' Questions

0

votes

1 answer

Unable to set config in spark-submit from command line

I am trying to set master URL in the Application jar using below: val spark = SparkSession .builder() .master("spark://master:7077") .appName("TestApp") .config("spark.sql.warehouse.dir", "/tmp/spark-warehouse") .getOrCreate() I try to…

apache-spark apache-spark-2.0 sparkcore

asked May 14 '17 at 17:02

sjuggernaut

1
1

0

votes

1 answer

Stackoverflowerror while using distinct in apache spark

I use Spark 2.0.1. I am trying to find distinct values in a JavaRDD as below JavaRDD distinct_installedApp_Ids = filteredInstalledApp_Ids.distinct(); I see that this line is throwing the below exception Exception in thread "main"…

java apache-spark rdd apache-spark-2.0

asked May 12 '17 at 10:14

Sathiya Narayanan

623
6
27

0

votes

0 answers

Spark Dataframe to SQL server storing wrong data for multiple records

I am seeing data correctly printed with dataframe.show , but in database it is storing previous value. For example we have 3 records: orderId| ItemSequence|OriginalId|price|groupId dddeff | 1 | 201 | 1.5 | 8 dddeff | 2 | …

scala apache-spark-sql apache-spark-2.0

asked May 11 '17 at 16:48

Sarat Babu Ikkurthy

1
1

0

votes

1 answer

UnaryTransformer instance throwing ClassCastException

I have a requirement to create my own UnaryTransformer instance that accepts a Dataframe Column of type Array[String] and should also output the same type.In trying to do so,I encountered a ClassCastException on my Spark version 2.1.0. I've put…

apache-spark apache-spark-2.0

asked May 11 '17 at 11:32

schengalath

11
3

0

votes

3 answers

How to create a schema for dataset in Hive table?

I am building a schema for the dataset below from a hive table. After processing I have to write the data to S3. I need to restructure and group the user id interaction based on date attached json image format to be prepared. For building this…

apache-spark pyspark apache-spark-sql apache-spark-dataset apache-spark-2.0

asked May 09 '17 at 08:31

Pradeep.D.s

1
6

0

votes

1 answer

PySpark : KeyError when converting a DataFrame column of String type to Double

I'm trying to learn machine learning with PySpark. I have a dataset that has a couple of String columns which have either True or False or Yes or No as its value. I'm working with DecisionTree and I wanted to convert these String values to…

python machine-learning pyspark user-defined-functions apache-spark-2.0

asked Apr 20 '17 at 05:41

Sivaprasanna Sethuraman

4,014
5
31
60

0

votes

3 answers

Search and replace in Apache Spark

We have created two dataset sentenceDataFrame, sentenceDataFrame2 where search replace should happen. sentenceDataFrame2 stores the search and replace terms. We also performed all 11 types of join 'inner', 'outer', 'full', 'fullouter', 'leftouter',…

join apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

asked Apr 17 '17 at 18:05

Nischay

168
2
14

0

votes

1 answer

Spark 2.0 with spark.read.text Expected scheme-specific part at index 3: s3: error

I am running into a weird issue with spark 2.0, using the sparksession to load a text file. Currently my spark config looks like: val sparkConf = new…

apache-spark amazon-s3 apache-spark-2.0

asked Mar 28 '17 at 16:44

Derek_M

1,018
10
22

0

votes

1 answer

Running spark application from HDInsight cluster headnode

I am trying to run spark scala application from head node of azure HDInsight cluster with command spark-submit --class com.test.spark.Wordcount SparkJob1.jar wasbs://containername@/sample.sas7bdat …

azure apache-spark azure-hdinsight azure-data-factory apache-spark-2.0

asked Mar 27 '17 at 13:25

vidyak

173
4
14

0

votes

1 answer

Spark mechanism of launching executors

I know that upon spark application start the driver process starts executor processes on worker nodes. But how exactly does it do it (in low level terms of spark source code)? What spark classes/methods implement that functionality? Can someone…

apache-spark pyspark apache-spark-2.0

asked Mar 22 '17 at 14:35

Konstantin Kotochigov

56
2

0

votes

1 answer

Using map function in Apache Spark for huge operation

We need to calculate distance matrix like jaccard on huge collection of Dataset in spark. Facing couple of issues. Kindly help us to give directions. Issue 1 import info.debatty.java.stringsimilarity.Jaccard; //sample Data set creation …

apache-spark dataset similarity apache-spark-2.0 apache-spark-dataset

asked Mar 09 '17 at 12:50

Nischay

168
2
14

0

votes

1 answer

Specifiying custom profilers for pyspark running Spark 2.0

I would like to know how to specify a custom profiler class in PySpark for Spark version 2+. Under 1.6, I know I can do so like this: sc = SparkContext('local', 'test', profiler_cls='MyProfiler') but when I create the SparkSession in 2.0 I don't…

apache-spark pyspark apache-spark-2.0

asked Mar 08 '17 at 15:59

femibyte

3,317
7
34
59

0

votes

1 answer

spark groupBy operation hangs at 199/200

I have a spark standalone cluster with master and two executors. I have an RDD[LevelOneOutput] and below is LevelOneOutput class class LevelOneOutput extends Serializable { @BeanProperty var userId: String = _ @BeanProperty var tenantId:…

scala apache-spark apache-spark-2.0

asked Mar 06 '17 at 10:29

Prasad Khode

6,602
11
44
59

0

votes

1 answer

Migration from Spark 1.6 to Spark 2.1 toLocalIterator throwing error

I have migrated my working code base from spark 1.6 to 2.1. There was an error while running my code. It showing error while i'm using toLocalIterator method for RDD. I tried to get glue from error log doesn't seems to be…

pyspark apache-spark-2.0

asked Mar 02 '17 at 10:11

Bruce

8,609
8
54
83

0

votes

1 answer

Iterators with DataSet in Spark 2.0

How do I iterate over a DataSet in Spark 2.0 and scala? My problem is - I need to compare two rows. I need to compare DateN and DateN-1 and calculate the difference. Row1 - Date1 Num1 Row2 - Date2 Num2 .. RowN- DateN NumN

scala iterator apache-spark-2.0 apache-spark-dataset

asked Feb 12 '17 at 14:49

coder AJ

1
4

Questions tagged [apache-spark-2.0]