Highest Voted 'apache-spark-2.0' Questions

0

votes

0 answers

Find maximum for a timestamp through Spark groupBy dataset

I would like to find the last record for an ID for a typed DataSet. I found a solution based on Dataframe : "Find minimum for a timestamp through Spark groupBy dataframe" Find minimum for a timestamp through Spark groupBy dataframe But how doing the…

scala apache-spark apache-spark-2.0

asked Oct 30 '17 at 21:30

user1759985

1

0

votes

1 answer

Getting the first row of the element from an array

I want to get the first row from a spark 2 dataset..the dataset is as follow: |arrayValue | +-------------------------------------------------------------+ |[1.47527718E12, 134535353E12] …

apache-spark-2.0

asked Oct 24 '17 at 07:42

Luckylukee

575
2
9
27

0

votes

1 answer

Apache Spark with Java, converting to Date Type from Varchar2 in Oracle fails

I have a usecase where I want to read data from one Oracle table where all fields are varchar type and save it to another Oracle table with similar fields but with ideally correct datatype. This has to be done only in java. So I want to read Dataset…

java oracle apache-spark apache-spark-sql apache-spark-2.0

asked Oct 22 '17 at 13:53

abhihello123

1,668
1
22
38

0

votes

1 answer

Spark Dataset or Dataframe for Aggregation

We have a MapR cluster with Spark version 2.0 We are trying to measure the performance difference of a Hive query which is currently running on TEZ engine and then running it on Spark-sql just by Writing the sql query in .hql file and then calling…

apache-spark-sql apache-spark-dataset apache-spark-2.0 databricks

asked Oct 17 '17 at 19:13

AJm

993
2
20
39

0

votes

1 answer

Spark on EMR "exceeding memory limits" for checkpointed/cached job

Is my understanding of caching wrong? The resulting RDD after all my transformations is incredibly small, like 1 GB. The data it was computed from is quite large, ~700 GB in size. I have to run logic to read in thousands of pretty big files, all to…

caching memory-leaks garbage-collection apache-spark-2.0 checkpointing

asked Oct 17 '17 at 00:30

pigate

351
1
7
16

0

votes

0 answers

Spark 2.11 with Java, Saving DataFrame in Oracle creates columns with double quotes

Using the following code in Spark(Java), we save dataframe in Oracle, it creates a table too if doesn't exists. Dataset someAccountDF = sparkSession.createDataFrame(impalaAccountsDF.toJavaRDD(),…

java apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

asked Oct 15 '17 at 08:29

abhihello123

1,668
1
22
38

0

votes

3 answers

Spark Job fails connecting to oracle in first attempt

We are running spark job which connect to oracle and fetch some data. Always attempt 0 or 1 of JDBCRDD task fails with below error. In subsequent attempt task get completed. As suggested in few portal we even tried with…

apache-spark apache-spark-2.0

asked Oct 12 '17 at 17:26

Rishi Saraf

1,644
2
14
27

0

votes

1 answer

Why there is no support for sparkSession with namedObject in spark job server?

I am trying to build an application with spark job server API(for spark 2.2.0). But I found that there is no support for namedObject with sparkSession. my looks like: import com.typesafe.config.Config import org.apache.spark.sql.SparkSession import…

scala apache-spark apache-spark-2.0 spark-jobserver

asked Oct 09 '17 at 06:10

arglee

1,374
4
17
30

0

votes

0 answers

Spark streaming saving dataframe fails

I am using Spark 2.2 to write to Redshift on AWS cluster and it is failing with the below error. I am using CDH 5.10 and scala 2.11.8. Any ideas on how to fix this? Is it missing the snappy dependency? WARN TaskSetManager:66 - Lost task 0.0 in…

spark-streaming amazon-redshift apache-spark-2.0 snappy

asked Sep 28 '17 at 05:48

Steven Park

377
1
13

0

votes

1 answer

Specify Azure key in Spark 2.x version

I'm trying to access a wasb(Azure blob storage) file in Spark and need to specify the account key. How do I specify the account in the spark-env.sh file? fs.azure.account.key.test.blob.core.windows.net …

azure apache-spark apache-spark-2.0

asked Sep 25 '17 at 15:51

user1050619

19,822
85
237
413

0

votes

3 answers

How to mask columns using Spark 2?

I have some tables in which I need to mask some of its columns. Columns to be masked vary from table to table and I am reading those columns from application.conf file. For example, for employee table as shown below +----+------+-----+---------+ |…

scala apache-spark apache-spark-sql apache-spark-2.0

asked Sep 20 '17 at 21:04

Shekhar

11,438
36
130
186

0

votes

1 answer

Spark on Hbase Jars

I am trying to run an example of SparkOnHbase as mentioned here -> Spark On Hbase But i am just trying to compile and run the code on my local windows machine. My build.sbt snippet below scalaVersion := "2.11.8" libraryDependencies…

apache-spark hbase cloudera apache-spark-2.0

asked Sep 20 '17 at 00:52

AJm

993
2
20
39

0

votes

0 answers

Unable to save RDD to HDFS in Apache Spark

I am getting the following error while trying to save the RDD to HDFS 17/09/13 17:06:42 WARN TaskSetManager: Lost task 7340.0 in stage 16.0 (TID 100118, XXXXXX.com, executor 2358): java.io.IOException: Failing write. Tried pipeline recovery 5 times…

apache-spark apache-spark-2.0

asked Sep 12 '17 at 14:37

vdep

3,541
4
28
54

0

votes

2 answers

Checkpointing With NOT Serializable

Want to understand a basic issue. Here is my code: def createStreamingContext(sparkCheckpointDir: String,batchDuration: Int ) = { val ssc = new StreamingContext(spark.sparkContext, Seconds(batchDuration)) ssc } val ssc =…

apache-spark apache-spark-2.0

asked Aug 31 '17 at 04:07

Ayan Guha

750
3
10

0

votes

2 answers

Kudu with PySpark2: Error with KuduStorageHandler

I am trying to read data in stored as Kudu using PySpark 2.1.0 >>> from os.path import expanduser, join, abspath >>> from pyspark.sql import SparkSession >>> from pyspark.sql import Row >>> spark = SparkSession.builder \ .master("local") \ …

hive cloudera-cdh apache-spark-sql apache-spark-2.0 apache-kudu

asked Aug 24 '17 at 22:33

New Coder

499
4
22

Questions tagged [apache-spark-2.0]