Highest Voted 'apache-spark-1.6' Questions

4

votes

1 answer

How to read a CSV file with commas within a field using pyspark?

I have a csv file containing commas within a column value. For example, Column1,Column2,Column3 123,"45,6",789 The values are wrapped in double quotes when they have extra commas in the data. In the above example, the values are Column1=123,…

asked Oct 08 '18 at 14:54

Bob

335
1
4
16

4

votes

2 answers

NullPointerException while reading a column from the row

The following Scala (Spark 1.6) code for reading a value from a Row fails with a NullPointerException when the value is null. val test = row.getAs[Int]("ColumnName").toString while this works fine val test1 = row.getAs[Int]("ColumnName") // returns…

scala apache-spark apache-spark-sql apache-spark-1.6

asked Dec 19 '17 at 08:14

Anurag Sharma

2,409
2
16
34

4

votes

1 answer

How to join on binary field?

In Scala/Spark, I am trying to do the following: val portCalls_Ports = portCalls.join(ports, portCalls("port_id") === ports("id"), "inner") However I am getting the following error: Exception in thread "main"…

scala apache-spark apache-spark-sql apache-spark-1.6

asked Jun 09 '17 at 14:23

Paul Reiners

8,576
33
117
202

4

votes

3 answers

Why does single test fail with "Error XSDB6: Another instance of Derby may have already booted the database"?

I use Spark 1.6. We have a HDFS write method that wrote to HDFS using SqlContext. Now we needed to switch over to using HiveContext. When we did that existing unit tests do not run and give the error Error XSDB6: Another instance of Derby may have…

apache-spark hdfs apache-spark-sql derby apache-spark-1.6

asked Jun 02 '17 at 06:56

Satyam

645
2
7
20

4

votes

3 answers

Spark CSV package not able to handle \n within fields

I have a CSV file which I am trying to load using Spark CSV package and it does not load data properly because few of the fields have \n within them for e.g. the following two rows "XYZ", "Test Data", "TestNew\nline", "OtherData" "XYZ", "Test…

scala apache-spark apache-spark-sql spark-csv apache-spark-1.6

asked May 30 '17 at 17:17

Umesh K

13,436
25
87
129

4

votes

3 answers

How to change hdfs block size in pyspark?

I use pySpark to write parquet file. I would like to change the hdfs block size of that file. I set the block size like this and it doesn't work: sc._jsc.hadoopConfiguration().set("dfs.block.size", "128m") Does this have to be set before starting…

hadoop apache-spark hdfs pyspark apache-spark-1.6

asked Dec 04 '16 at 02:46

Sean Nguyen

12,528
22
74
113

4

votes

1 answer

Apache Spark: setting executor instances

I run my Spark application on YARN with parameters: in spark-defaults.conf: spark.master yarn-client spark.driver.cores 1 spark.driver.memory 1g spark.executor.instances 6 spark.executor.memory 1g in…

apache-spark hadoop-yarn executors apache-spark-1.6

asked Oct 26 '16 at 16:08

Anna

98
1
7

3

votes

1 answer

Why does persist(StorageLevel.MEMORY_AND_DISK) give different results than cache() with HBase?

I could sound naive asking this question but this is a problem that I have recently faced in my project. Need some better understanding on it. df.persist(StorageLevel.MEMORY_AND_DISK) Whenever we use such persist on a HBase read - the same data is…

apache-spark java-8 apache-spark-sql hbase apache-spark-1.6

asked Aug 27 '18 at 05:15

Dasarathy D R

335
2
7
20

3

votes

2 answers

How to replace nulls in Vector column?

I have a column of type [vector] and I have null values in it that I can't get rid of, here's an example import org.apache.spark.mllib.linalg.Vectors val sv1: Vector = Vectors.sparse(58, Array(8, 45), Array(1.0, 1.0)) val df_1 =…

scala apache-spark apache-spark-sql apache-spark-1.6

asked Jun 07 '18 at 13:22

Alexvonrass

330
2
12

3

votes

1 answer

How to load spark.mllib model without SparkContext to predict?

With Spark1.6.0 MLLib, I'd build a model (like RandomForest) and save to hdfs,and then is was possible to load the randomforest model from hdfs to predict without SparkContext.Now,load the model we can use like this: val loadModel =…

apache-spark-mllib random-forest apache-spark-1.6

asked Jun 07 '17 at 02:18

shaojie

121
1
11

3

votes

1 answer

scala dataframe filter array of strings

Spark 1.6.2 and Scala 2.10 here. I want to filter the spark dataframe column with an array of strings. val df1 = sc.parallelize(Seq((1, "L-00417"), (3, "L-00645"), (4, "L-99999"),(5, "L-00623"))).toDF("c1","c2") +---+-------+ | c1| …

scala apache-spark scala-2.10 apache-spark-1.6

asked Apr 06 '17 at 18:57

Ramesh

1,563
9
25
39

3

votes

1 answer

Where can I find the jars folder in Spark 1.6?

From the Spark downloads page, if I download the tar file for v2.0.1, I see that it contains some jars that I find useful to include in my app. If I download the tar file for v1.6.2 instead, I don't find the jars folder in there. Is there an…

apache-spark jar apache-spark-1.6

asked Mar 07 '17 at 11:00

sudheeshix

1,541
2
17
28

3

votes

2 answers

Combining Spark schema without duplicates?

To process the data I have, I am extracting the schema before, so that when I read the dataset, I provide the schema instead of going through the expensive step of inferring schema. In order to construct the schema, I need to merge in several…

scala apache-spark schema apache-spark-1.6

asked Dec 27 '16 at 22:45

THIS USER NEEDS HELP

3,136
4
30
55

3

votes

1 answer

Spark Streaming application fails with KafkaException: String exceeds the maximum size or with IllegalArgumentException

TL;DR: My very simple Spark Streaming application fails in the driver with the "KafkaException: String exceeds the maximum size". I see the same exception in the executor but I also found somewhere down the executor's logs an…

apache-kafka spark-streaming hadoop-yarn cloudera-cdh apache-spark-1.6

asked May 10 '16 at 07:10

Gideon

2,211
5
29
47

3

votes

2 answers

How to control number of partition while reading data from Cassandra?

I use: cassandra 2.1.12 - 3 nodes spark 1.6 - 3 nodes spark cassandra connector 1.6 I use tokens in Cassandra (not vnodes). I am writing a simple job of reading a data from a Cassandra table and displaying its count table is having around 70…

apache-spark cassandra spark-cassandra-connector apache-spark-1.6

asked Apr 21 '16 at 07:22

deenbandhu

599
5
18

Questions tagged [apache-spark-1.6]