Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].
Questions tagged [apache-spark-2.0]
464 questions
0
votes
1 answer
Unable to set config in spark-submit from command line
I am trying to set master URL in the Application jar using below:
val spark = SparkSession
.builder()
.master("spark://master:7077")
.appName("TestApp")
.config("spark.sql.warehouse.dir", "/tmp/spark-warehouse")
.getOrCreate()
I try to…

sjuggernaut
- 1
- 1
0
votes
1 answer
Stackoverflowerror while using distinct in apache spark
I use Spark 2.0.1.
I am trying to find distinct values in a JavaRDD as below
JavaRDD distinct_installedApp_Ids = filteredInstalledApp_Ids.distinct();
I see that this line is throwing the below exception
Exception in thread "main"…

Sathiya Narayanan
- 623
- 6
- 27
0
votes
0 answers
Spark Dataframe to SQL server storing wrong data for multiple records
I am seeing data correctly printed with dataframe.show , but in database it is storing previous value.
For example we have 3 records:
orderId| ItemSequence|OriginalId|price|groupId
dddeff | 1 | 201 | 1.5 | 8
dddeff | 2 | …
0
votes
1 answer
UnaryTransformer instance throwing ClassCastException
I have a requirement to create my own UnaryTransformer instance that accepts a Dataframe Column of type Array[String] and should also output the same type.In trying to do so,I encountered a ClassCastException on my Spark version 2.1.0.
I've put…

schengalath
- 11
- 3
0
votes
3 answers
How to create a schema for dataset in Hive table?
I am building a schema for the dataset below from a hive table.
After processing I have to write the data to S3.
I need to restructure and group the user id interaction based on date attached json image format to be prepared.
For building this…

Pradeep.D.s
- 1
- 6
0
votes
1 answer
PySpark : KeyError when converting a DataFrame column of String type to Double
I'm trying to learn machine learning with PySpark. I have a dataset that has a couple of String columns which have either True or False or Yes or No as its value. I'm working with DecisionTree and I wanted to convert these String values to…

Sivaprasanna Sethuraman
- 4,014
- 5
- 31
- 60
0
votes
3 answers
Search and replace in Apache Spark
We have created two dataset sentenceDataFrame, sentenceDataFrame2 where search replace should happen.
sentenceDataFrame2 stores the search and replace terms.
We also performed all 11 types of join 'inner', 'outer', 'full', 'fullouter', 'leftouter',…

Nischay
- 168
- 2
- 14
0
votes
1 answer
Spark 2.0 with spark.read.text Expected scheme-specific part at index 3: s3: error
I am running into a weird issue with spark 2.0, using the sparksession to load a text file. Currently my spark config looks like:
val sparkConf = new…

Derek_M
- 1,018
- 10
- 22
0
votes
1 answer
Running spark application from HDInsight cluster headnode
I am trying to run spark scala application from head node of azure HDInsight cluster with command
spark-submit --class com.test.spark.Wordcount SparkJob1.jar
wasbs://containername@/sample.sas7bdat
…

vidyak
- 173
- 4
- 14
0
votes
1 answer
Spark mechanism of launching executors
I know that upon spark application start the driver process starts executor processes on worker nodes. But how exactly does it do it (in low level terms of spark source code)?
What spark classes/methods implement that functionality? Can someone…
0
votes
1 answer
Using map function in Apache Spark for huge operation
We need to calculate distance matrix like jaccard on huge collection of Dataset in spark.
Facing couple of issues. Kindly help us to give directions.
Issue 1
import info.debatty.java.stringsimilarity.Jaccard;
//sample Data set creation
…

Nischay
- 168
- 2
- 14
0
votes
1 answer
Specifiying custom profilers for pyspark running Spark 2.0
I would like to know how to specify a custom profiler class in PySpark for Spark version 2+. Under 1.6, I know I can do so like this:
sc = SparkContext('local', 'test', profiler_cls='MyProfiler')
but when I create the SparkSession in 2.0 I don't…

femibyte
- 3,317
- 7
- 34
- 59
0
votes
1 answer
spark groupBy operation hangs at 199/200
I have a spark standalone cluster with master and two executors. I have an RDD[LevelOneOutput] and below is LevelOneOutput class
class LevelOneOutput extends Serializable {
@BeanProperty
var userId: String = _
@BeanProperty
var tenantId:…

Prasad Khode
- 6,602
- 11
- 44
- 59
0
votes
1 answer
Migration from Spark 1.6 to Spark 2.1 toLocalIterator throwing error
I have migrated my working code base from spark 1.6 to 2.1. There was an error while running my code. It showing error while i'm using toLocalIterator method for RDD. I tried to get glue from error log doesn't seems to be…

Bruce
- 8,609
- 8
- 54
- 83
0
votes
1 answer
Iterators with DataSet in Spark 2.0
How do I iterate over a DataSet in Spark 2.0 and scala? My problem is - I need to compare two rows. I need to compare DateN and DateN-1 and calculate the difference.
Row1 - Date1 Num1
Row2 - Date2 Num2
..
RowN- DateN NumN

coder AJ
- 1
- 4