Highest Voted 'apache-spark-2.0' Questions

0

votes

1 answer

Apache Spark 2.0 - date_add function

I have a simple schema with a date and an int. I want to use date_add to add the int to the date. scala> val ds1 = spark.read.option("inferSchema",true).csv("samp.csv") ds1.printSchema(); root |-- _c0: timestamp (nullable = true) |-- _c1:…

asked Feb 11 '17 at 05:18

coder AJ

1
4

0

votes

1 answer

Spark 2.0: A named function inside mapGroups for sql.KeyValueGroupedDataset cause java.io.NotSerializableException

Anonymous function work fine. For following code set up the problem: import sparkSession.implicits._ val sparkSession = SparkSession.builder.appName("demo").getOrCreate() val sc = sparkSession.sparkContext case class DemoRow(keyId: Int, evenOddId:…

apache-spark-sql scala-collections apache-spark-2.0

asked Feb 09 '17 at 18:48

Y.G.

661
7
7

0

votes

1 answer

Spark-java multithreading vs running individual spark jobs

I am new with Spark and trying to understand performance difference in below approaches (Spark on hadoop) Scenario : As per batch processing I have 50 hive queries to run.Some can run parallel and some sequential. - First approach All of queries can…

apache-spark apache-spark-sql apache-spark-2.0

asked Feb 02 '17 at 03:16

user2895589

1,010
4
20
33

0

votes

2 answers

Spark 2.0 CSV Error

I am upgrading to spark 2 from 1.6 and am having an issue reading in CSV files. In spark 1.6 I would have something like this to read in a CSV file. val df = sqlContext.read.format("com.databricks.spark.csv") .option("header",…

csv apache-spark apache-spark-2.0 databricks

asked Jan 18 '17 at 17:59

st33l3rf4n

11
2
5

0

votes

0 answers

spark-sql - using nested query to filter data

I have huge .csv file which has several columns but the columns of importance to me are USER_ID(User Identifier), DURATION(Duration of Call), TYPE(Incoming or Outgoing), DATE, NUMBER(Mobile No.). So what I am trying to do is : replace all null…

java apache-spark apache-spark-sql apache-spark-2.0 apache-spark-dataset

asked Jan 17 '17 at 11:17

sensitive_piece_of_horseflesh

909
4
16
40

0

votes

1 answer

Apache Spark isn't playing nice with Jersey dependency injection

I'm trying to use the com.github.sps.metrics.metrics-opentsdb library to log metrics from my spark job to my OpenTSDB server. I'm running into an issue where I get a strange NPE down in the jersey code that deals with EncodingFilters. Here is the…

java apache-spark jersey jersey-2.0 apache-spark-2.0

asked Jan 12 '17 at 21:50

Hardy

477
9
19

0

votes

1 answer

is there any google/aws services to move data from google store to s3

In my usecase all google related app and ads data generation is going to store in google store.but my processing engine runs on Spark on AWS cloud. can some one please help how i can move this GS data S3 to process. Thank You in advance

amazon-s3 google-cloud-storage google-cloud-datastore google-cloud-platform apache-spark-2.0

asked Jan 05 '17 at 06:53

user348133

1

0

votes

1 answer

How to persist a DataFrame to a Hive table?

I use CentOS on Cloudera QuickStart VM. I created a sbt-managed Spark application following the other question How to save DataFrame directly to Hive?. build.sbt libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" libraryDependencies…

apache-spark hive hdfs apache-spark-2.0

asked Dec 25 '16 at 19:04

sdinesh94

1,138
15
32

0

votes

1 answer

createOrReplaceTempView does not work on empty dataframe in pyspark2.0.0

I am trying to define a sql view on a pyspark dataframe(2.0.0) and getting errors like "Table or View Not found". What I am doing : 1. Create an empty dataframe 2. load data from different location into a temp dataframe 3. append the temp data frame…

apache-spark-sql apache-spark-2.0

asked Dec 21 '16 at 06:33

braj

2,545
2
29
40

0

votes

1 answer

Cassandra select query multiple params

Using casssandra 2.28, java-connector3, sparks2.0. I am trying to write a simple query with multiple select params- unable to get the syntax right. Single param works CassandraJavaRDD rdd = javaFunc …

cassandra datastax-java-driver spark-cassandra-connector apache-spark-2.0

asked Dec 14 '16 at 19:19

Sam-T

1,877
6
23
51

0

votes

2 answers

What is the behavior of transformations and actions in Spark?

We're performing some tests to evaluate the behavior of transformations and actions in Spark with Spark SQL. In our tests, first we conceive a simple dataflow with 2 transformations and 1 action: LOAD (result: df_1) > SELECT ALL FROM df_1 (result:…

apache-spark apache-spark-sql apache-spark-2.0

asked Dec 09 '16 at 11:27

Brccosta

39
1
6

0

votes

1 answer

Apache spark join with dynamic re-partitionion

I'm trying to do a fairly straightforward join on two tables, nothing complicated. Load both tables, do a join and update columns but it keeps throwing an exception. I noticed the task is stuck on the last partition 199/200 and eventually crashes.…

scala apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

asked Dec 02 '16 at 15:37

Philip K. Adetiloye

3,102
4
37
63

0

votes

0 answers

How to create two columns from a single column in a dataframe using pyspark

I have a transform a dataframe which look like this +---------+------+ | Country|Status| +---------+------+ |[AW,null]| 14| |[UG,null]| 47| |[CY,null]| 1324| |[AO,null]| 20| |[US,null]|325242| |[KE,null]| 246| |[DK,true]| …

apache-spark pyspark apache-spark-sql apache-spark-2.0

asked Dec 01 '16 at 08:29

Mukesh Jha

125
1
1
6

0

votes

0 answers

How to transform a Dataset of a known type to one with a generic type

So I've got this example code where I have a Dataset[Event] which I would like to group based on a key of generic type computed on the fly. import org.apache.spark.sql.{ Dataset, KeyValueGroupedDataset } case class Event(id: Int, name:…

scala apache-spark apache-spark-dataset apache-spark-2.0

asked Nov 14 '16 at 23:27

aa8y

3,854
4
37
62

0

votes

0 answers

Dataframe save to Redshift from Spark2 job running on dataproc cluster stalls

I have a dataframe (Dataset) and want to save this dataframe to Redshift. df.write() .format("com.databricks.spark.redshift") .option("url", url) .option("dbtable", dbTable) .option("tempdir", tempDir) .mode("append") …

java amazon-s3 amazon-redshift google-cloud-dataproc apache-spark-2.0

asked Oct 28 '16 at 15:29

Christian

23
5

Questions tagged [apache-spark-2.0]