Questions tagged [apache-spark-2.0]

Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].

464 questions
0
votes
1 answer

spark mvn compile error.[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile)

i'm learning refer with the book that 'Spark with Machine learning' groupId: org.apache.spark artifactId: spark-core_2.11 version: 2.0.1 JavaApp.java import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import…
김시온
  • 3
  • 3
0
votes
0 answers

Spark 2.0 GROUP BY NULLS

Working on migrating some queries from Spark 1.5 to Spark 2.0 The query is the following: SELECT users.age AS users_age, NULL AS users_running_total_on_null FROM users GROUP BY users.age ORDER BY users_age LIMIT 1 First of all, I know…
0
votes
1 answer

Conceptual difference RDD to Dataset in Spark 2.0?

I read What is the difference between Spark DataSet and RDD Difference between DataSet API and…
Make42
  • 12,236
  • 24
  • 79
  • 155
0
votes
1 answer

Issues in fetching data from cassandra using spark cassandra connector

I am trying to migrate from spark-1.6 to spark-2.0 and was trying some basic data operation with spark Cassandra connector 2.0 I was just fetching data from a table and printing it using rdd.foreach(println) but this is giving me error with…
0
votes
1 answer

Spark 2.0 ` java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date` error

We are maintaining an Hive data warehouse and use sparkSQL to make queries against the hive database and generate reports. We are using Spark 1.6 in AWS EMR enviounment and that has been working fine. I wanted to upgrade our environments to spark…
farazZ
  • 108
  • 9
-1
votes
2 answers

Creating Third Table Using Two Table With Help Of Spark-Sql or PySpark (No Use of Panda(Python))

I am trying to Create Third Table Using Two Table With Help Of Spark-Sql or PySpark (No Use of Panda(Python)) Dataframe One: +---------+---------+------------+-----------+ | NAME | NAME_ID | CLIENT | CLIENT_ID…
-1
votes
1 answer

Return two columns when mapping through a column list Spark SQL Scala

I want to programmatically give a certain number of fields and for some fields, select a column and pass that field to another function that will return a case class of string, string. So far I have val myList = Seq(("a", "b", "c", "d"), ("aa",…
uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
-1
votes
2 answers

Reuse scripts in spark-shell

I am using spark with scala to do timeseries analysis. I am writing the same scripts in spark-shell everytime i close and open. I would like to be suggested how to save my scripts from spark-shell and use it later. Do i need to dowload scala IDE,…
Magg_rs
  • 323
  • 2
  • 3
  • 12
-1
votes
1 answer

When will encoders for map type be available?

I'm trying to explore lot in Spark 2.0, so i'm curious to know "When will encoders for map type be available ?" Thanks for your suggestion in advance. Vinoth.
-2
votes
1 answer

How Apache Spark can preserve order of lines in the output textFile?

Can anyone help me understand how apache-spark is able preserve the order of lines in output, when read from a textFile. Consider the below code snippet, sparkContext.textFile() .coalesce(1) …
Anoop Deshpande
  • 514
  • 1
  • 6
  • 23
-2
votes
1 answer

EMR Cluster shows too many executors when spark dynamic allocation is true

I am running spark job with cluster mode in EMR 5.27.0. EMR comes with dynamic spark allocation property set to true. Now when i start spark job or even start spark shell i can see many executors launched in Spark UI. Why this is happening even…
-2
votes
1 answer

How do I achieve this in Apache Spark Java or Scala?

A device on a car will NOT send a TRIP ID when the trip starts but will send one when the TRIP ends. How do I apply corresponding TRIP IDS to the corresponding…
-2
votes
1 answer

Pyspark: not able to print after file text is split

I am new to spark coding with Python (pyspark). I have a txt file in which messages needs to be split at }{ . That is message starts with {...}{...}... like this. I want to split these into {...} {...} {...} few also has inner message…
vasista k.j
  • 51
  • 1
  • 1
  • 4
-2
votes
2 answers

Spark.read.csv Error: java.io.IOException: Permission Denied

I am using Spark v2.0 and trying to read a csv file using: spark.read.csv("filepath") But getting the below error: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied at…
1 2 3
30
31