Highest Voted 'scala-spark' Questions

0

votes

0 answers

AWS Glue scala spark job failing - org.apache.spark.util.collection.CompactBuffer[] not registered in Kryo

The below code segment is failing as per the SparkUI history server. segmentIdToTripIdsRDD.join(segmentIdToRSMSegmentRDD) .map(tuple => { val tripIds: Iterable[String] = tuple._2._1._1 …

asked Jun 30 '23 at 04:24

Aki008

405
2
6
19

0

votes

1 answer

Spark extract values from Json struct

I have a spark dataframe column (custHeader) in the below format and I want to extract the value of the key - phone into a separate column. trying to use the from_json function, but it is giving me a null value. valArr:array element:struct …

apache-spark pyspark struct scala-spark

asked Jun 30 '23 at 03:23

marc

319
1
5
20

0

votes

1 answer

Spark broadcasts right dataset from left join, which causes org.apache.spark.sql.execution.OutOfMemorySparkException

Spark broadcasts right dataset from left join, which causes org.apache.spark.sql.execution.OutOfMemorySparkException: Size of broadcasted table far exceeds estimates and exceeds limit of spark.driver.maxResultSize, despite I used settings to disable…

apache-spark scala-spark

asked Jun 02 '23 at 15:31

alsetr

13
3

0

votes

0 answers

Spark ColumnarBatches and storing them in InMmeoryRelation for fast queries in spark scala

I have been trying to implement InMemoryRelation based on spark ColumnarBatches, so far I have not been able to store the vectorised columnarbatch into the relation. Is there a way to achieve this without going with an intermediary representation…

apache-spark apache-spark-sql simd apache-arrow scala-spark

asked Apr 03 '23 at 21:52

user2546218

1

0

votes

3 answers

Convert Vector[String] to Dataframe in Scala Spark

I have this Vector[String]: user_uid,score,value 255938,34096,8 259117,34599,10 253664,28891,7 how can I convert it to DataFrame? I already tried this: val dataInVectorRow = dataInVectorString .map(_.split("\\s+")) .map(x =>…

scala apache-spark scala-spark

asked Mar 29 '23 at 14:38

AT181903

11
4

0

votes

1 answer

spark scala exploding struct array throwing error ambigous reference to fields

Currently I'm working on exploding a struct array with pair of keys are same. { "A": [{ "AA": { "AB": "21", "AC": "R", "AD": "20222832522117601", "AE": "2", "AF": { …

json scala apache-spark apache-spark-sql scala-spark

asked Mar 29 '23 at 06:16

instancedeveloper

11
3

0

votes

0 answers

Get Error Records from deequ VerificationSuite

When we run any deequ VerificationSuite, can we see the input data exception records with respect to each rule when there is any error on rule. For ex: if rule1 failed for 10 records out of total 100 records, I see only summary which says this…

scala apache-spark amazon-data-pipeline amazon-deequ scala-spark

asked Mar 16 '23 at 20:43

PythonDeveloper

289
1
4
24

0

votes

1 answer

save dataframe with records limit but also make sure same value is not across multiple files

suppose I have this dataframe: id value A 1 A 2 A 3 B 1 B 2 C 1 D 1 D 2 and so on. basically I want to make sure even with records limit any certain id can only appear in one single file(suppose number of entries with that…

apache-spark scala-spark

asked Mar 09 '23 at 19:51

ForkPork

37
4

0

votes

1 answer

Why different behavior when mixed case are used, vs same case are used in spark 3.2

I am running a simple query in spark 3.2 val df1 = sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", "col5") val op_cols_same_case = List("id","col2","col3","col4", "col5", "id") val df2 =…

apache-spark apache-spark-sql scala-spark

asked Feb 08 '23 at 09:31

ASR

53
6

0

votes

1 answer

Issues running Graph queries after upgrading Spark 2.4.3 to 3.1.3

We are upgrading the Scala Spark Spark from 2.4.3 to 3.1.3 scalaVersion from 2.11.8 to 2.12.10 spark-cassandra-connector from 2.4.2 to 3.1.0 Cassandra version 3.2 and all the subsequent dependancies. We are facing following issues, [error]…

scala cassandra spark-cassandra-connector scala-spark

asked Feb 07 '23 at 16:13

user21166408

1
1

0

votes

1 answer

How to create Scala trait which stores data from other columns in dataset and then create new dataset with column storing the trait in Scala?

I am new to Scala and am currently studying datasets for Scala and Spark. Based on my input dataset below, I am trying to create a new dataset (see below). In the new dataset, I aim to have a new column which contains a Scala trait…

scala apache-spark scala-spark

asked Feb 06 '23 at 17:52

AIBball

101
1
1
5

0

votes

0 answers

How to integrate Intellij and Databricks, like when using the jdwp with a regular Spark cluster?

I have been looking online for awhile, but have found nothing, thus this question. I would like to be able to debug my Apache Spark code (written in Scala) remotely on Databricks, similar to the way it can be done on regular Spark clusters using the…

apache-spark databricks jdwp scala-spark

asked Dec 24 '22 at 07:07

MrMuppet

547
1
4
12

0

votes

0 answers

Spark - Map udf to windows in spark dataframe

Problem Statement: Have to group InputDf based on multiple columns (accountGuid, appID, deviceGuid, deviceMake) and order each group by time Need to check if the test Df exists in the exact sequence in each window If it exists, create a new…

apache-spark apache-spark-sql scala-spark spark-window-function

asked Dec 22 '22 at 22:03

sujoy majumder

1
2

0

votes

0 answers

How to use Google Session Token in Spark to connect to Google Cloud Storage bucket

I want to read data from Google Storage Bucket using Google Session Token in Spark Application. Here instead of json.keyfile I want to use Google Session Key in spark conf option. I tried with json.key file but Actually I am looking for Google…

apache-spark google-bucket scala-spark

asked Dec 21 '22 at 16:13

Sachin Patil

1
3

0

votes

1 answer

Add a tag to the list in the DataFrame based on the threshold given for the values in the list in Scala Spark

I have a Dataframe that has a column "grades" containing a list of Grade objects that have 2 fields: name (String) and value (Double). I would like to add the word PASS to the list of tags if there is a Grade on the list with the name: HOME and a…

dataframe scala apache-spark apache-spark-sql scala-spark

asked Dec 18 '22 at 20:15

xard4sTR

25
6

Questions tagged [scala-spark]