Questions tagged [apache-spark-1.6]

Use for questions specific to Apache Spark 1.6. For general questions related to Apache Spark use the tag [apache-spark].

111 questions
0
votes
1 answer

How to perform dynamic partition based on row count in dataFram for a column value

I am trying to partition a input files based on accountId But this partition has be done only if dataFrames contains more than 1000 records. The accountId is a dynamic integer that could not be uknown. Consider the following code below val ssc =…
Achaius
  • 5,904
  • 21
  • 65
  • 122
0
votes
1 answer

How to know which is the RDD type inferred by Spark using Scala

I was trying the follow example val lista = List(("a", 3), ("a", 1), ("b", 7), ("a", 5)) val rdd = sc.parallelize(lista) Then in the shell I get the following rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[40] at parallelize…
Joseratts
  • 97
  • 1
  • 9
0
votes
0 answers

Spark only uses 1CPU when 2x4CPU are available on reduce()

I have 3 machines: 1x Master with 4x CPU, 8G RAM ; 2x executors with 4x CPU and 16G RAM. The master is standalone mode (no YARN), I'm using pyspark. Even if it is not a huge infrastructure I would still expect some perf out of it. When running a…
pltrdy
  • 2,069
  • 1
  • 11
  • 29
-1
votes
1 answer

Delete Unicode value in output of Spark 1.6 using Scala

The file generated from API contains data like below col1,col2,col3 503004,(d$üíõ$F|'.h*Ë!øì=(.î;      ,.¡|®!®3-2-704 when i am reading in spark it is appearing like this. i am using case class to read from RDD then convert it to DataFrame using…
Sophie Dinka
  • 73
  • 1
  • 8
-1
votes
1 answer

Reading Encoded value in spark 1.6 throwing Error

I am receiving file from API which have a encoded(non-ascii) character value in 3 columns. when i am reading file using DataFrame in Spark1.6 val CleanData= sqlContext.sql("""SELECT COL1 …
-1
votes
1 answer

Read Impala table with SparkSQL

I was trying to execute a query that had functions like lead .. over .. partition and Union. This query works well when I try to run it on impala but fails on Hive. I need to write a Spark job that performs this query. It is failing as well in…
New Coder
  • 499
  • 4
  • 22
1 2 3 4 5 6 7
8