Highest Voted 'apache-spark-1.6' Questions

1

vote

1 answer

How to extract the ElementType of an Array as an instance of StructType

I try to decompose the structure of a complex dataframe in spark. I am only interested in the nested arrays under the root. The issue is that I can't retrieve the ElementType from the type of StructField. Here is an example, this schema of a…

asked Jun 21 '17 at 10:32

Ismail Addou

383
1
2
17

1

vote

1 answer

How to unregister Spark UDF

I use Spark 1.6.0 with Java. I'd like to unregister a Spark UDF. Is there a way like dropping a temporary table sqlContext.drop(TemporaryTableName)? sqlContext.udf().register("isNumeric", value -> { …

java apache-spark apache-spark-sql apache-spark-1.6

asked Jun 09 '17 at 04:40

JasonG

13
1
3

1

vote

1 answer

How to find the schema of values in DStream at runtime?

I use Spark 1.6 and Kafka 0.8.2.1. I am trying to fetch some data from Kafka using Spark Streaming and do some operations on that data. For that I should know the schema of the fetched data, is there some way for this or can we get values from…

apache-spark apache-kafka spark-streaming apache-spark-1.6

asked May 28 '17 at 18:25

JSR29

354
1
5
17

1

vote

1 answer

Why does reading from Hive fail with "java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found"?

I use Spark v1.6.1 and Hive v1.2.x with Python v2.7 For Hive, I have some tables (ORC files) stored in HDFS and some stored in S3. If we are trying to join 2 tables, where one is in HDFS and the other is in S3, a java.lang.RuntimeException:…

apache-spark amazon-s3 hive apache-spark-sql apache-spark-1.6

asked May 18 '17 at 19:04

Jane Wayne

8,205
17
75
120

1

vote

2 answers

Why does reading from CSV fail with NumberFormatException?

I use Spark 1.6.0 and Scala 2.10.5. $ spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 import org.apache.spark.sql.SQLContext import sqlContext.implicits._ import org.apache.spark.sql.types.{StructType, StructField, StringType,…

scala csv apache-spark apache-spark-sql apache-spark-1.6

asked May 14 '17 at 08:05

codelover

15
6

1

vote

3 answers

Calculate maximum number of observations per group

I use Spark 1.6.2. I need to find maximum count per each group. val myData = Seq(("aa1", "GROUP_A", "10"),("aa1","GROUP_A", "12"),("aa2","GROUP_A", "12"),("aa3", "GROUP_B", "14"),("aa3","GROUP_B", "11"),("aa3","GROUP_B","12" ),("aa2", "GROUP_B",…

scala apache-spark apache-spark-1.6

asked May 11 '17 at 15:54

Dinosaurius

8,306
19
64
113

1

vote

2 answers

Pivot spark scala dataframe

I am trying to use pivot method in scala-spark val dfOutput = df_input.groupBy("memberlogin").pivot("country_group2").count() However, though there isn't any compliation error while creating a jar in eclipse, while execution in spark, its giving…

scala apache-spark apache-spark-1.6

asked Apr 04 '17 at 09:55

rajendra patil

31
1
6

1

vote

1 answer

dataframe too many arguments in the rdd object

I tryed to use this question to convert rdd object to dataframe in spark. The class in my use case contains more than 100 arguments (columns) case class MyClass(val1: String, ..., val104: String ) val df = rdd.map({ case Row(val1:…

scala apache-spark apache-spark-sql apache-spark-1.6

asked Apr 03 '17 at 11:17

Zied Hermi

229
1
2
11

1

vote

1 answer

Why does executing SQL against Hive table using SQLContext in application fail (but the same query in spark-shell works fine)?

I am using Spark 1.6. I am trying to connect to a table in my spark-sql java code by : JavaSparkContext js = new JavaSparkContext(); SQLContext sc = new SQLContext(js); DataFrame mainFile = sc.sql("Select * from db.table"); It gives me a table…

java hive apache-spark-sql apache-spark-1.6

asked Sep 28 '16 at 08:05

Aviral Kumar

814
1
15
40

1

vote

0 answers

Spark datasets: Exception when using groupBy MissingRequirementError

I am starting to work with Spark datasets, I am facing this exception when I execute a groupby in Spark 1.6.1 case class RecordIdDate(recordId: String, date: String) val ds = sc.parallelize(List(RecordIdDate("hello","1"),…

scala apache-spark apache-spark-dataset apache-spark-1.6

asked Sep 20 '16 at 01:17

Mikel San Vicente

3,831
2
21
39

1

vote

0 answers

Apache Spark self join big data set on multiple columns

Im running apache spark on a hadoop cluster, using yarn. I have a big data set, something like 160 million records. I have to perform a self join. The join is done on exact match of 1 column (c1), a date overlap match and a match on at least 1 of 2…

java apache-spark apache-spark-dataset apache-spark-1.6

asked Aug 29 '16 at 12:54

Sorin

61
6

1

vote

1 answer

KMeans with Spark 1.6.2 VS Spark 2.0.0

I am using Kmeans() in an environment I have no control and I will abandon in <1 month. Spark 1.6.2. is installed. Should I pay the price for urging 'them' to upgrade to Spark 2.0.0 before I leave? In other words, does Spark 2.0.0 introduce any…

apache-spark machine-learning k-means apache-spark-1.6 apache-spark-2.0

asked Aug 25 '16 at 19:28

gsamaras

71,951
46
188
305

1

vote

0 answers

pyspark installation error, "ImportError: No module named pyspark"

I am trying to install apache spark-1.6.1 as a stand alone mode. I have followed "https://github.com/KristianHolsheimer/pyspark-setup-guide" link. But, after the execution of $ sbt/sbt assembly I have tried $ ./bin/run-example SparkPi 10" but,…

apache-spark pyspark apache-spark-1.6

asked May 27 '16 at 12:10

Sounak

13
3

1

vote

1 answer

How to unit test Spark Streaming code?

I use the latest Spark 1.6.0. Looked at another stackoverflow post How can I make Spark Streaming count the words in a file in a unit test? I am trying to use the sample @ https://gist.github.com/emres/67b4eae86fa92df69f61 have for writing a sample…

apache-spark spark-streaming apache-spark-1.6

asked Feb 11 '16 at 14:31

CodeDreamer

444
2
8

0

votes

0 answers

How to combine UDFs when creating a new column in Pyspark 1.6

I am trying to aggregate a table that I have around one kay value (id here) so that I can have one row per id and perform some verifications on the rows that belong to each id in order to identify the 'result' (type of transaction of sorts). Lets…

pyspark apache-spark-1.6

asked Nov 03 '22 at 18:51

Leonardo Novais

1
1

Questions tagged [apache-spark-1.6]