Highest Voted 'spark3' Questions

0

votes

1 answer

How to read map in spark3 with java

DataSet person = spark.read.textfile(path).map(Person::new,Encoders.bean(Person.class)) when i tried above it will works in spark2.4(scala-2.11) but in spark3.1.1(scala-2.12) it's shows as ambigous for the type DataSet. And also wherver i use…

spark3

asked Dec 13 '21 at 10:28

Anuradha

1
1

0

votes

1 answer

How to get add_months Spark2 behaviour in Spark3

We are migrating a huge codebase from Spark2 to Spark 3.x. In order to make the migration incrementally, some configs were set to legacy to have the same behavior as in Spark 2.x. The function add_months, however, AFAIK does not have a "legacy"…

apache-spark apache-spark-sql spark3

asked Oct 15 '21 at 10:50

Diego

1
2

0

votes

1 answer

spark struct streaming writeStream output no data but no error

I have a struct streaming job which reads message from Kafka topic then saves to dbfs. The code is as follows: input_stream = spark.readStream \ .format("kafka") \ .options(**kafka_options) \ .load() \ .transform(create_raw_features) #…

pyspark apache-kafka spark-structured-streaming spark-kafka-integration spark3

asked Aug 25 '21 at 20:46

fuyi

2,573
4
23
46

0

votes

1 answer

Jackson databind error with scalatest Flatspec

I was trying to execute the scala test cases in IntelliJ using gradle with spark 3.1.1 & scala 2.12.13. But the scala tests were failing with the below jackson-databind error. val conf = new SparkConf().setMaster("local[2]") val spark =…

scala apache-spark jackson scalatest spark3

asked Aug 12 '21 at 22:01

vamsi

344
5
22

0

votes

1 answer

Need help migrating from Spark 2.0 to Spark 3.1 - Accumulable to AccumulatorV2

I'm working on adding Spark 3.1 and Scala 2.12 support for Kylo Data-Lake Management Platform. I need help with migrating the following functions: /** * Creates an {@link Accumulable} shared variable with a name for display in the Spark…

scala apache-spark-sql apache-spark-2.0 kylo spark3

asked Aug 02 '21 at 12:07

SaleemKhair

499
3
12

0

votes

0 answers

Spark UDF: Apply np.sum over a list of values in a data frame and filter values based on threshold

Very knew to using spark for data manipulation and UDF. I have a sample df with different test scores. There are 50 different columns like these. I am trying to define a custom apply function to filter values (total counts in each row) which are…

python dataframe apache-spark user-defined-functions spark3

asked Jun 22 '21 at 21:15

Hackerds

1,195
2
16
34

0

votes

0 answers

Convert Spark2.2's UDAF to 3.0 Aggregator

I have a already written UDAF in scala using Spark2.4. Since our Databricks cluster was in 6.4 runtime whose support is no more there, we need to move to 7.3 LTS which have the long term support and uses Spark3. UDAF is deprecated in Spark3 and will…

scala apache-spark spark3 spark2.4.4

asked May 21 '21 at 13:03

Girish Rawat

11
4

0

votes

1 answer

spark3 crashes with py4j.protocol.Py4JJavaError

I'm trying to migrate from emr-5.28.0(spark 2.4.4) to emr-6.2.0(spark 3.0.1), and the most basic usage of spark crashes no matter what I do. This my test_pyspark.py file: from pyspark.sql import SparkSession spark =…

apache-spark pyspark amazon-emr spark3

asked Apr 14 '21 at 07:32

Ben Siman

53
2
6

0

votes

1 answer

Elasticsearch plugin for PySpark 3.1.1

I used Elasticsearch Spark 7.12.0 with PySpark 2.4.5 successfully. Both read and write were perfect. Now, I'm testing the upgrade to Spark 3.1.1, this integration doesn't work anymore. No code change in PySpark between 2.4.5 & 3.1.1. Is there a…

elasticsearch pyspark spark3

asked Mar 27 '21 at 00:24

Sahas

3,046
6
32
53

0

votes

3 answers

How to read such a nested multiline json file into a data frame with Spark/Scala

I have the following json: { "value":[ {"C1":"val1","C2":"val2"}, {"C1":"val1","C2":"val2"}, {"C1":"val1","C2":"val2"} ] } That i am trying to read like this: spark.read .option("multiLine",…

scala apache-spark apache-spark-sql spark3

asked Mar 11 '21 at 13:48

CoolStraw

5,282
8
42
64

0

votes

1 answer

Does Spark 3.0.1 support custom Aggregators on window functions?

I wrote a custom Aggregator (an extension of org.apache.spark.sql.expressions.Aggregator) and Spark invokes it correctly as an aggregating function under group by statement: sparkSession .createDataFrame(...) .groupBy(col("id")) .agg( …

java apache-spark spark3

asked Dec 01 '20 at 13:42

igor

33
3

0

votes

1 answer

Pyspark.ml - Error when loading model and Pipeline

I want to import a trained pyspark model (or pipeline) into a pyspark script. I trained a decision tree model like so: from pyspark.ml.classification import DecisionTreeClassifier from pyspark.ml.feature import VectorAssembler from…

apache-spark pyspark spark3

asked Oct 14 '20 at 15:45

FVCC

262
2
16

0

votes

1 answer

Apache spark 3.0 with HDP 2.6 stack

We are planning to setup Apache Spark 3.0 outside of existing HDP 2.6 cluster and to submit the jobs using yarn(v2.7) in that cluster without upgrade or modifying. Currently users are using Spark 2.3 which is included in the HDP stack. Goal is to…

apache-spark hdp spark3

asked Oct 07 '20 at 00:48

mpkd567

1

0

votes

1 answer

Spark 3 is failing when I try to execute a simple query

I have this table on Hive: CREATE TABLE `mydb`.`raw_sales` ( `combustivel` STRING, `regiao` STRING, `estado` STRING, `jan` STRING, `fev` STRING, `mar` STRING, `abr` STRING, `mai` STRING, `jun` STRING, `jul` STRING, `ago` STRING, `set` STRING, `out`…

sql crash spark3

asked Sep 28 '20 at 22:52

Andre Carneiro

708
1
5
27

0

votes

1 answer

find set of keys in Scala map where values overlap

I'm working with a map object in scala where the key is a basket ID and the value is a set of item ID's contained within a basket. The goal is to ingest this map object and compute for each basket, a set of other basket ID's that contain at least…

scala apache-spark maps spark3

asked Aug 30 '20 at 04:05

tyjchen

5
2

Questions tagged [spark3]