Highest Voted 'spark-window-function' Questions

0

votes

1 answer

I want ntile(3) within ntile(3) as in subdivision within division by ntile()

I want to create a ntile(3) within an ntile(3). I have the following table: Customer Total_amt Digital_amt 1 100 45 2 200 150 3 150 23 4 300 100 5 350 350 6 112 10 7 312 15 8 260 160 9 232 150 10 190 132 I want to have…

asked Apr 04 '23 at 16:35

sp22

15
5

0

votes

0 answers

Can I get a specific column value based on an aggregated column?

I am aggregating a table in AWS Glue (spark sql) to get the min of created_on and the sum of total. However I want to group by id, id2, id3 and to get the corresponding engment_partn_text value for the earliest created_on date. So that the first 3…

apache-spark-sql aggregate-functions window-functions aws-glue spark-window-function

asked Mar 27 '23 at 17:16

Aleksei Díaz

35
5

0

votes

0 answers

Default window frame in spark sql

I recently got bitten by the behavior highlighted in this question. My question is that why is the default behavior the way that it is. Why does merely adding an order by to window specification change the frame type to range and end of range to…

sql apache-spark-sql window-functions spark-window-function

asked Mar 23 '23 at 23:30

Alok Lal

273
1
2
11

0

votes

1 answer

Group consecutive rows using spark scala with rows repeating

scala apache-spark pyspark group-by spark-window-function

asked Feb 23 '23 at 19:53

Rohit M

13
2

0

votes

0 answers

Get a cumulative sum with time interval condition

I have a dataframe with group, value, date_start, and date_end. I want to take the cumulative sum of all values: partition by group that end before the current date_start Here is how the data looks like: +-----+-----+----------+----------+ |group…

pyspark apache-spark-sql window-functions cumulative-sum spark-window-function

asked Feb 20 '23 at 03:49

Hau Phan

1
1

0

votes

0 answers

Spark - Map udf to windows in spark dataframe

Problem Statement: Have to group InputDf based on multiple columns (accountGuid, appID, deviceGuid, deviceMake) and order each group by time Need to check if the test Df exists in the exact sequence in each window If it exists, create a new…

apache-spark apache-spark-sql scala-spark spark-window-function

asked Dec 22 '22 at 22:03

sujoy majumder

1
2

0

votes

1 answer

transition matrix from pyspark dataframe

I have two columns (such as): from to 1 2 1 3 2 4 4 2 4 2 4 3 3 3 And I want to create a transition matrix (where sum of rows in a columns add up to 1): 1. 2. 3. 4. 1. 0 0 0 0 2. …

python-3.x matrix pyspark group-by spark-window-function

asked Nov 25 '22 at 19:35

LaC

5
4

0

votes

1 answer

Return value from prior year Spark sql

I have the following table: YearMonth value 202101 5 202102 7 .... 10 202201 4 I need to create third column with the value from the previous year. When I have 202201 there should be valuePriorYear equals 5 (value from…

sql apache-spark-sql partition-by spark-window-function

asked Oct 11 '22 at 15:45

inspiredd

195
2
11

0

votes

1 answer

Window function sum, multiplied by condition

I am reviewing a code and would love to have a bit more clarity. Here is my PySpark Dataframe: YEAR_A YEAR_B AMOUNT 2000 2001 5 2000 2000 4 2000 2001 3 I initiate a window function: window = Window.partitionBy('YEAR_A') Then I would…

python apache-spark pyspark operators spark-window-function

asked Oct 04 '22 at 14:26

lejcestlesang

75
1
2
9

0

votes

1 answer

Window function based on a condition

I have the following DF: |-----------------------| |Date | Val | Cond| |-----------------------| |2022-01-08 | 2 | 0 | |2022-01-09 | 4 | 1 | |2022-01-10 | 6 | 1 | |2022-01-11 | 8 | 0 | |2022-01-12 | 2 | 1 | |2022-01-13 | 5 …

pyspark spark-window-function

asked Sep 05 '22 at 21:40

OdiumPura

444
5
25

0

votes

1 answer

To find distinct rows even the column values are interchanged in SQL

I have a table like this: strt end no of km ------------------------------------ California India 9000 Singapore India 3000 India Frankfurt 6700 Frankfurt Singapore 4500 India …

sql distinct spark-window-function

asked Aug 24 '22 at 04:46

sri123

3
4

0

votes

2 answers

Sum() Window Function in PySpark without defining window spec

I am trying to add a new column "grand total" to my table on each row. E.G: first_name Order_id price John 1 2.5 Ali 2 2 Abdul 3 3.5 What I want…

python apache-spark pyspark palantir-foundry spark-window-function

asked Jul 28 '22 at 19:17

Hassaan Anwar

91
7

0

votes

1 answer

adding flag based on occurrence of repetitive pattern in column categories using pyspark

I have a pyspark dataframe like this: port#| log_date |code 1111 |2022-05-16 08:07:23|AAA 1111 |2022-05-16 08:08:23|XXX 1111 |2022-05-16 08:09:23|BBB 1111 |2022-05-16 08:10:23|CCC 1111 |2022-05-16 08:11:23|YYY 1111 |2022-05-16…

dataframe pyspark filter flags spark-window-function

asked Jul 04 '22 at 15:36

user1537434

3
1

0

votes

0 answers

java.lang.IllegalStateException: Error reading delta file, stateful spark structured streaming with kafka

We are running a stateful structured streaming job which reads from Kafka and writes to HDFS. And we are hitting this exception: 17/12/08 05:20:12 ERROR FileFormatWriter: Aborting job null. org.apache.spark.SparkException: Job aborted due to stage…

scala spark-structured-streaming spark-window-function

asked Apr 29 '22 at 07:19

Sushmita

1
1

0

votes

4 answers

find the top n unique values of a column based on ranking of another column within groups in pyspark

I have a dataframe like below: df = pd.DataFrame({ 'region': [1,1,1,1,1,1,2,2,2,3], 'store': ['A', 'A', 'C', 'C', 'D', 'B', 'F', 'F', 'E', 'G'], 'call_date': ['2022-03-10', '2022-03-09', '2022-03-08', '2022-03-07',…

python pyspark pyspark-pandas spark-window-function

asked Mar 31 '22 at 15:50

zesla

11,155
16
82
147

Questions tagged [spark-window-function]