Highest Voted 'partition-by' Questions

3

votes

2 answers

Reset Row Number on value change, but with repeat values in partition

I'm having trouble with something very similar to this question T-sql Reset Row number on Field Change The solution to this question is perfect, works fine. Except when I try with multiple other 'custno', it breaks down. What I mean by that: custno…

asked May 01 '20 at 07:19

hello

33
1
3

3

votes

1 answer

spark: save ordered data to parquet

I have 30TB of data partioned by date and hour, split into 300 files each hour. I do some data conversion & then want the data sorted and saved in the sorted order for easy ingestion by a C++ program. I understand that when you serialize, the…

apache-spark pyspark sql-order-by parquet partition-by

asked Nov 15 '19 at 16:58

user1978816

812
1
8
19

3

votes

2 answers

how to find number of active users for say 1 day,2 days, 3 days.....postgreSQL

A distribution of # days active within a week: I am trying to find how many members are active for 1 day, 2days, 3days,…7days during a specific week 3/1-3/7. Is there any way to use aggregate function on top of partition by? If not what can be used…

sql postgresql windowing partition-by

asked Apr 22 '19 at 17:36

KK44

31
1

3

votes

1 answer

groupby and join vs window in pyspark

I have a data frame in pyspark which has hundreds of millions of rows (here is a dummy sample of it): import datetime import pyspark.sql.functions as F from pyspark.sql import Window,Row from pyspark.sql.functions import col from…

group-by pyspark window partition-by

asked Feb 04 '19 at 14:53

Dan R

71
7

3

votes

2 answers

Scala: collect_list() over Window with keeping null values

I have a data frame like the below: +----+----+----+ |colA|colB|colC| +----+----+----+ |1 |1 |23 | |1 |2 |63 | |1 |3 |null| |1 |4 |32 | |2 |2 |56 | +----+----+----+ I apply the below instructions such that I create a…

scala apache-spark apache-spark-sql partition-by

asked Sep 26 '18 at 18:01

WhoAmI

31
1
2

3

votes

0 answers

Spark partition by key

What is difference between two types of partitions in Spark? For example: I load a text file toto.csv from disk to spark cluster val text = sc.textFile("toto.csv", 100) => It split my file into 100 fragments without "rules" After that, if I do val…

scala apache-spark partition-by

asked Feb 22 '16 at 11:59

minh-hieu.pham

1,029
2
12
21

3

votes

3 answers

clojure: partition a seq based on a seq of values

I would like to partition a seq, based on a seq of values (partition-by-seq [3 5] [1 2 3 4 5 6]) ((1 2 3)(4 5)(6)) The first input is a seq of split points. The second input is a seq i would like to partition. So, that the first list will be…

clojure split clojurescript partition partition-by

asked Mar 17 '15 at 14:56

Thomas Deutsch

2,344
2
27
36

2

votes

2 answers

SQL get last column in a group based on order by and value of another column

From the table below, I am trying to create another column (last_k2X) that will show the last key2 where type was set to 'X' over time (ti). In case X was set at the same time (ts) for multiple key2, (in same key1 partition), the new column last_k2X…

sql window-functions partition-by exasol first-value

asked Aug 27 '23 at 13:58

Alg_D

2,242
6
31
63

2

votes

1 answer

Rank date column based on dates in another table date column

I have the following tables df_obs.show() Item No Date_Observed Item 1 2021-09-20 Item 1 2022-12-05 Item 2 2022-10-27 Item 1 2022-09-20 Item 2 2023-02-20 Item 2 2023-03-20 Item 1 2023-01-20 df_purchase.withColumn("rank",…

python apache-spark pyspark apache-spark-sql partition-by

asked Apr 07 '23 at 13:34

imadari

23
5

2

votes

1 answer

Extracted Date within a Partition By Function

I am trying to run the following SQL query but BigQuery shows an error: PARTITION BY expression references column date which is neither grouped nor aggregated SELECT EXTRACT(MONTH FROM date) month, country_name, SUM(installs), DENSE_RANK()…

sql google-bigquery partition-by

asked Mar 25 '23 at 14:14

Ihsan Mercan

23
3

2

votes

1 answer

PARTITION BY with date between 2 date

I work on Azure SQL Database working with SQL Server In SQL, I try to have a table by day, but the day is not in the table. I explain it by the example below: TABLE STARTER: (Format Date: YYYY-MM-DD) Date begin Date…

sql sql-server date between partition-by

asked Jun 09 '22 at 10:23

Ouakrat

67
1
9

2

votes

1 answer

Write dataframe without column names as part of the file path

I have to write a Spark dataframe in the path of the format: base_path/{year}/{month}/{day}/{hour}/ If I do something like below: pc = ["year", "month", "day", "hour"] df.write.partitionBy(*pc).parquet("base_path/", mode = 'append') It creates the…

apache-spark pyspark filepath partition-by

asked Apr 29 '22 at 23:54

seou1

446
1
5
21

2

votes

3 answers

nth group labelling (count only graduates with new group)

Say, I have a df: df <- data.table(user = c('a', 'a', 'b') ); df user a a b I am looking to create a new column, group_id, at user level: user group_id a 1 a 1 b 2 Is there a fast / scalable way? I can…

r data.table partition-by

asked Oct 15 '21 at 15:12

Sweepy Dodo

1,761
9
15

2

votes

2 answers

How to run SUM() OVER PARTITION BY for COUNT DISTINCT

I'm trying to get the number of distinct users for each event at a daily level while maintainig a running sum for every hour. I'm using Athena/Presto as the query engine. I tried the following query: SELECT eventname, …

sql presto trino partition-by

asked Feb 02 '21 at 06:43

nish

6,952
18
74
128

2

votes

1 answer

PySpark: PartitionBy leaves the same value in column by which partitioned multiple times

I need to partitionBy in order to get distinct values in the time and match_instatid column, but it only produces distinct values about half the time window_match_time_priority =…

pyspark partition-by

asked Feb 20 '20 at 20:22

Jakub Bares

159
1
11

Questions tagged [partition-by]