Questions tagged [apache-spark-1.4]

Use for questions specific to Apache Spark 1.4. For general questions related to Apache Spark use the tag [apache-spark].

31 questions

vote

1 answer

Spark: DecoderException: java.lang.OutOfMemoryError

I am running a Spark streaming application on a cluster with 3 worker nodes. Once in a while jobs are failing due to the following exception: Job aborted due to stage failure: Task 0 in stage 4508517.0 failed 4 times, most recent failure: Lost task…

apache-spark spark-streaming apache-spark-1.4

asked Oct 14 '15 at 00:52

user3646174

vote

1 answer

Slow or incomplete saveAsParquetFile from EMR Spark to S3

I have a piece of code that creates a DataFrame and persists it to S3. Below creates a DataFrame of 1000 rows and 100 columns, populated by math.Random. I'm running this on a cluster with 4 x r3.8xlarge worker nodes, and configuring plenty of…

amazon-s3 apache-spark amazon-emr parquet apache-spark-1.4

asked Sep 09 '15 at 07:11

Kirk Broadhurst

27,836
16
104
169

vote

1 answer

Spark 1.4 Mllib LDA topicDistributions() returning wrong number of documents

I have an LDA model running on corpus size of 12,054 documents with vocab size of 9,681 words and 60 clusters. I am trying to get the topic distribution over documents by calling .topicDistributions() or .javaTopicDistributions(). Both of these…

cluster-analysis apache-spark-mllib lda apache-spark-1.4

asked Aug 14 '15 at 21:57

smannan

vote

2 answers

Spark SQL + Streaming issues

We are trying to implement a use case using Spark Streaming and Spark SQL that allows us to run user-defined rules against some data (See below for how the data is captured and used). The idea is to use SQL to specify the rules and return the…

apache-spark spark-streaming apache-spark-sql apache-spark-1.4

asked Aug 10 '15 at 17:36

Subhash Vaddiparty

vote

2 answers

Spark grouping and custom aggregation

I have data as below, n1 d1 un1 mt1 1 n1 d1 un1 mt2 2 n1 d1 un1 mt3 3 n1 d1 un1 mt4 4 n1 d2 un1 mt1 3 n1 d2 un1 mt3 3 n1 d2 un1 mt4 4 n1 d2 un1 mt5 6 n1 d2 un1 mt2 3 Ii want to get the output as below n1 d1 un1 0.75 n1 d2 un1…

apache-spark apache-spark-sql apache-spark-1.4

asked Aug 04 '15 at 18:59

Akash

vote

1 answer

Compile error while calling updateStateByKey

Compile Error : The method updateStateByKey(Function2,Optional~~,Optional~~>) in the type JavaPairDStream is not applicable for the arguments (Function2,Optional,Optional>) In a …~~~~

spark-streaming apache-spark-1.4

asked Aug 03 '15 at 05:07
dexter

451

1

4

19

1
vote

1 answer

CaseWhen in spark DataFrame

I'd like to understand how to use the CaseWhen expressions with the new DataFrame api. I can't see any reference to it in the documentation, and the only place I saw it was in the…

scala apache-spark apache-spark-sql apache-spark-1.4

asked Jul 27 '15 at 13:47
lev

3,986

4

33

46

0
votes

1 answer

pyspark 1.4 how to get list in aggregated function

I want to get list of a column values in aggregated function, in pyspark 1.4. The collect_list is not available. Does anyone have suggestion how to do it? Original columns: ID, date, hour, cell 1, 1030, 01, cell1 1, 1030, 01, cell2 2, 1030, 01,…

python list pyspark apache-spark-1.4

asked Dec 06 '17 at 22:53
Helen Z

21

1

8

0
votes

1 answer

Python versions in worker node and master node vary

Running spark 1.4.1 on CentOS 6.7. Have both python 2.7 and python 3.5.1 installed on it with anaconda. MAde sure that PYSPARK_PYTHON env var is set to python3.5 but when I open pyspark shell and execute a simple rdd transformation, it errors out…

python-2.7 apache-spark apache-spark-1.4

asked May 05 '16 at 00:58
Abhi

1,153

1

23

38

0
votes

1 answer

Spark worker node removed but not gone

I am using Spark standalone with a master and a single worker just to test. At first I used one worker box but now I decided to use a different worker box. To do this, I stopped the Master that was running, I changed the IP in the conf/slave file,…

apache-spark apache-spark-1.4

asked Oct 13 '15 at 16:35
user1342645

655

3

8

13

0
votes

1 answer

Select values from a dataframe column

I would like to calculate the difference between two values from within the same column. Right now I just want the difference between the last value and the first value, however using last(column) returns a null result. Is there a reason last()…

scala apache-spark apache-spark-sql apache-spark-1.4

asked Sep 28 '15 at 16:11
the3rdNotch

637

2

8

18

0
votes

1 answer

Databricks - How to create a Library with updated maven artifacts

We initially created a library in databricks using a maven artifact. We see all the jars are present in library and please note that this maven artifact is ours. We found few issues with the artifact. Fixed it and updated in maven central…

maven apache-spark apache-spark-1.4

asked Aug 07 '15 at 09:57
sag

5,333

8

54

91

0
votes

1 answer

Apache Spark 1.4.1 Build Failed

I have download Apache Spark 1.4.1 from the official site. As follows: I don't have hadoop installed in my machine. Apache provides build command. So, I tried to start building the project using following command build/mvn -Pyarn -Phadoop-2.4…

maven apache-spark maven-assembly-plugin apache-spark-1.4

asked Jul 29 '15 at 06:00
Avinash Mishra

1,346

3

21

41

0
votes

1 answer

Spark 1.4 image for Google Cloud?

With bdutil, the latest version of tarball I can find is on spark 1.3.1: gs://spark-dist/spark-1.3.1-bin-hadoop2.6.tgz There are a few new DataFrame features in Spark 1.4 that I want to use. Any chance the Spark 1.4 image be available for bdutil, or…

apache-spark google-hadoop apache-spark-1.4

asked Jul 16 '15 at 23:27
Haiying Wang

652

7

10

0
votes

4 answers

Why does insertInto fail when working with tables in non-default database?

I'm using Spark 1.4.0 (PySpark). I have a DataFrame loaded from Hive table using this query: sqlContext = HiveContext(sc) table1_contents = sqlContext.sql("SELECT * FROM my_db.table1") When I attempt to insert data from table1_contents after some…

apache-spark hive pyspark apache-spark-sql apache-spark-1.4

asked Jul 15 '15 at 05:53
oikonomiyaki

7,691

15

62

101

Prev 1
2
3 Next