Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].
Questions tagged [apache-spark-2.0]
464 questions
0
votes
1 answer
spark mvn compile error.[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile)
i'm learning refer with the book that 'Spark with Machine learning'
groupId: org.apache.spark
artifactId: spark-core_2.11
version: 2.0.1
JavaApp.java
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import…

김시온
- 3
- 3
0
votes
0 answers
Spark 2.0 GROUP BY NULLS
Working on migrating some queries from Spark 1.5 to Spark 2.0
The query is the following:
SELECT
users.age AS users_age,
NULL
AS users_running_total_on_null
FROM users
GROUP BY users.age
ORDER BY users_age
LIMIT 1
First of all, I know…

theGreenCabbage
- 5,197
- 19
- 79
- 169
0
votes
1 answer
Conceptual difference RDD to Dataset in Spark 2.0?
I read
What is the difference between Spark DataSet and RDD
Difference between DataSet API and…

Make42
- 12,236
- 24
- 79
- 155
0
votes
1 answer
Issues in fetching data from cassandra using spark cassandra connector
I am trying to migrate from spark-1.6 to spark-2.0 and was trying some basic data operation with spark Cassandra connector 2.0
I was just fetching data from a table and printing it using rdd.foreach(println)
but this is giving me error with…

deenbandhu
- 599
- 5
- 18
0
votes
1 answer
Spark 2.0 ` java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date` error
We are maintaining an Hive data warehouse and use sparkSQL to make queries against the hive database and generate reports. We are using Spark 1.6 in AWS EMR enviounment and that has been working fine.
I wanted to upgrade our environments to spark…

farazZ
- 108
- 9
-1
votes
2 answers
Creating Third Table Using Two Table With Help Of Spark-Sql or PySpark (No Use of Panda(Python))
I am trying to Create Third Table Using Two Table With Help Of Spark-Sql or PySpark (No Use of Panda(Python))
Dataframe One:
+---------+---------+------------+-----------+
| NAME | NAME_ID | CLIENT | CLIENT_ID…

Rishabh Singh
- 1
- 1
-1
votes
1 answer
Return two columns when mapping through a column list Spark SQL Scala
I want to programmatically give a certain number of fields and for some fields, select a column and pass that field to another function that will return a case class of string, string. So far I have
val myList = Seq(("a", "b", "c", "d"), ("aa",…

uh_big_mike_boi
- 3,350
- 4
- 33
- 64
-1
votes
2 answers
Reuse scripts in spark-shell
I am using spark with scala to do timeseries analysis. I am writing the same scripts in spark-shell everytime i close and open. I would like to be suggested how to save my scripts from spark-shell and use it later.
Do i need to dowload scala IDE,…

Magg_rs
- 323
- 2
- 3
- 12
-1
votes
1 answer
When will encoders for map type be available?
I'm trying to explore lot in Spark 2.0, so i'm curious to know "When will encoders for map type be available ?"
Thanks for your suggestion in advance.
Vinoth.
-2
votes
1 answer
How Apache Spark can preserve order of lines in the output textFile?
Can anyone help me understand how apache-spark is able preserve the order of lines in output, when read from a textFile. Consider the below code snippet,
sparkContext.textFile()
.coalesce(1)
…

Anoop Deshpande
- 514
- 1
- 6
- 23
-2
votes
1 answer
EMR Cluster shows too many executors when spark dynamic allocation is true
I am running spark job with cluster mode in EMR 5.27.0. EMR comes with dynamic spark allocation property set to true.
Now when i start spark job or even start spark shell i can see many executors launched in Spark UI.
Why this is happening even…

Sarang Shinde
- 717
- 3
- 7
- 24
-2
votes
1 answer
How do I achieve this in Apache Spark Java or Scala?
A device on a car will NOT send a TRIP ID when the trip starts but will send one when the TRIP ends. How do I apply corresponding TRIP IDS to the corresponding…

Vinodh Thiagarajan
- 758
- 3
- 9
- 19
-2
votes
1 answer
Pyspark: not able to print after file text is split
I am new to spark coding with Python (pyspark).
I have a txt file in which messages needs to be split at }{ . That is message starts with {...}{...}... like this. I want to split these into
{...}
{...}
{...}
few also has inner message…

vasista k.j
- 51
- 1
- 1
- 4
-2
votes
2 answers
Spark.read.csv Error: java.io.IOException: Permission Denied
I am using Spark v2.0 and trying to read a csv file using:
spark.read.csv("filepath")
But getting the below error:
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
at…

Pratyush Sharma
- 279
- 3
- 5