Use for questions specific to Apache Spark 1.6. For general questions related to Apache Spark use the tag [apache-spark].
Questions tagged [apache-spark-1.6]
111 questions
0
votes
1 answer
How to perform dynamic partition based on row count in dataFram for a column value
I am trying to partition a input files based on accountId But this partition has be done only if dataFrames contains more than 1000 records. The accountId is a dynamic integer that could not be uknown. Consider the following code below
val ssc =…

Achaius
- 5,904
- 21
- 65
- 122
0
votes
1 answer
How to know which is the RDD type inferred by Spark using Scala
I was trying the follow example
val lista = List(("a", 3), ("a", 1), ("b", 7), ("a", 5))
val rdd = sc.parallelize(lista)
Then in the shell I get the following
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[40] at parallelize…

Joseratts
- 97
- 1
- 9
0
votes
0 answers
Spark only uses 1CPU when 2x4CPU are available on reduce()
I have 3 machines: 1x Master with 4x CPU, 8G RAM ; 2x executors with 4x CPU and 16G RAM.
The master is standalone mode (no YARN), I'm using pyspark.
Even if it is not a huge infrastructure I would still expect some perf out of it.
When running a…

pltrdy
- 2,069
- 1
- 11
- 29
-1
votes
1 answer
Delete Unicode value in output of Spark 1.6 using Scala
The file generated from API contains data like below
col1,col2,col3
503004,(d$üíõ$F|'.h*Ë!øì=(.î; ,.¡|®!®3-2-704
when i am reading in spark it is appearing like this. i am using case class to read from RDD then convert it to DataFrame using…

Sophie Dinka
- 73
- 1
- 8
-1
votes
1 answer
Reading Encoded value in spark 1.6 throwing Error
I am receiving file from API which have a encoded(non-ascii) character value in 3 columns.
when i am reading file using DataFrame in Spark1.6
val CleanData= sqlContext.sql("""SELECT
COL1
…

Sophie Dinka
- 73
- 1
- 8
-1
votes
1 answer
Read Impala table with SparkSQL
I was trying to execute a query that had functions like lead .. over .. partition and Union. This query works well when I try to run it on impala but fails on Hive.
I need to write a Spark job that performs this query. It is failing as well in…

New Coder
- 499
- 4
- 22