Use for questions specific to Apache Spark 1.6. For general questions related to Apache Spark use the tag [apache-spark].
Questions tagged [apache-spark-1.6]
111 questions
1
vote
1 answer
How to extract the ElementType of an Array as an instance of StructType
I try to decompose the structure of a complex dataframe in spark. I am only interested in the nested arrays under the root. The issue is that I can't retrieve the ElementType from the type of StructField.
Here is an example, this schema of a…

Ismail Addou
- 383
- 1
- 2
- 17
1
vote
1 answer
How to unregister Spark UDF
I use Spark 1.6.0 with Java.
I'd like to unregister a Spark UDF. Is there a way like dropping a temporary table sqlContext.drop(TemporaryTableName)?
sqlContext.udf().register("isNumeric", value -> {
…

JasonG
- 13
- 1
- 3
1
vote
1 answer
How to find the schema of values in DStream at runtime?
I use Spark 1.6 and Kafka 0.8.2.1.
I am trying to fetch some data from Kafka using Spark Streaming and do some operations on that data.
For that I should know the schema of the fetched data, is there some way for this or can we get values from…

JSR29
- 354
- 1
- 5
- 17
1
vote
1 answer
Why does reading from Hive fail with "java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found"?
I use Spark v1.6.1 and Hive v1.2.x with Python v2.7
For Hive, I have some tables (ORC files) stored in HDFS and some stored in S3. If we are trying to join 2 tables, where one is in HDFS and the other is in S3, a java.lang.RuntimeException:…

Jane Wayne
- 8,205
- 17
- 75
- 120
1
vote
2 answers
Why does reading from CSV fail with NumberFormatException?
I use Spark 1.6.0 and Scala 2.10.5.
$ spark-shell --packages com.databricks:spark-csv_2.10:1.5.0
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._
import org.apache.spark.sql.types.{StructType, StructField, StringType,…

codelover
- 15
- 6
1
vote
3 answers
Calculate maximum number of observations per group
I use Spark 1.6.2.
I need to find maximum count per each group.
val myData = Seq(("aa1", "GROUP_A", "10"),("aa1","GROUP_A", "12"),("aa2","GROUP_A", "12"),("aa3", "GROUP_B", "14"),("aa3","GROUP_B", "11"),("aa3","GROUP_B","12" ),("aa2", "GROUP_B",…

Dinosaurius
- 8,306
- 19
- 64
- 113
1
vote
2 answers
Pivot spark scala dataframe
I am trying to use pivot method in scala-spark
val dfOutput = df_input.groupBy("memberlogin").pivot("country_group2").count()
However, though there isn't any compliation error while creating a jar in eclipse,
while execution in spark, its giving…

rajendra patil
- 31
- 1
- 6
1
vote
1 answer
dataframe too many arguments in the rdd object
I tryed to use this question to convert rdd object to dataframe in spark. The class in my use case contains more than 100 arguments (columns)
case class MyClass(val1: String, ..., val104: String )
val df = rdd.map({
case Row(val1:…

Zied Hermi
- 229
- 1
- 2
- 11
1
vote
1 answer
Why does executing SQL against Hive table using SQLContext in application fail (but the same query in spark-shell works fine)?
I am using Spark 1.6.
I am trying to connect to a table in my spark-sql java code by :
JavaSparkContext js = new JavaSparkContext();
SQLContext sc = new SQLContext(js);
DataFrame mainFile = sc.sql("Select * from db.table");
It gives me a table…

Aviral Kumar
- 814
- 1
- 15
- 40
1
vote
0 answers
Spark datasets: Exception when using groupBy MissingRequirementError
I am starting to work with Spark datasets, I am facing this exception when I execute a groupby in Spark 1.6.1
case class RecordIdDate(recordId: String, date: String)
val ds = sc.parallelize(List(RecordIdDate("hello","1"),…

Mikel San Vicente
- 3,831
- 2
- 21
- 39
1
vote
0 answers
Apache Spark self join big data set on multiple columns
Im running apache spark on a hadoop cluster, using yarn.
I have a big data set, something like 160 million records. I have to perform a self join. The join is done on exact match of 1 column (c1), a date overlap match and a match on at least 1 of 2…

Sorin
- 61
- 6
1
vote
1 answer
KMeans with Spark 1.6.2 VS Spark 2.0.0
I am using Kmeans() in an environment I have no control and I will abandon in <1 month. Spark 1.6.2. is installed.
Should I pay the price for urging 'them' to upgrade to Spark 2.0.0 before I leave? In other words, does Spark 2.0.0 introduce any…

gsamaras
- 71,951
- 46
- 188
- 305
1
vote
0 answers
pyspark installation error, "ImportError: No module named pyspark"
I am trying to install apache spark-1.6.1 as a stand alone mode. I have followed "https://github.com/KristianHolsheimer/pyspark-setup-guide" link.
But, after the execution of
$ sbt/sbt assembly
I have tried
$ ./bin/run-example SparkPi 10"
but,…

Sounak
- 13
- 3
1
vote
1 answer
How to unit test Spark Streaming code?
I use the latest Spark 1.6.0.
Looked at another stackoverflow post How can I make Spark Streaming count the words in a file in a unit test?
I am trying to use the sample @ https://gist.github.com/emres/67b4eae86fa92df69f61 have for writing a sample…

CodeDreamer
- 444
- 2
- 8
0
votes
0 answers
How to combine UDFs when creating a new column in Pyspark 1.6
I am trying to aggregate a table that I have around one kay value (id here) so that I can have one row per id and perform some verifications on the rows that belong to each id in order to identify the 'result' (type of transaction of sorts). Lets…

Leonardo Novais
- 1
- 1