Questions tagged [partition-by]

266 questions
0
votes
1 answer

Using SQL order by based on two columns

I have a data set that has 2 rows per part (one for 2021, one for 2022) and 16 columns. One of those columns is the Volume loss in dollars for 2022 (the volume loss in dollars for 2021 is always null value). I want to sort the data set by the volume…
L. Cruz
  • 13
  • 1
0
votes
1 answer

Return value from prior year Spark sql

I have the following table: YearMonth value 202101 5 202102 7 .... 10 202201 4 I need to create third column with the value from the previous year. When I have 202201 there should be valuePriorYear equals 5 (value from…
0
votes
0 answers

SQL window function: sum over all past rows where value compared to current row is greater on large table

On a large table (> 1 billion rows), for every timestamp, I want to sum values over rows with a timestamp smaller or equal to the current timestamp where the price in column A (out_price) is smaller than the price in column B (price) from the…
smaica
  • 723
  • 2
  • 11
  • 26
0
votes
0 answers

Spark Conflicting directory structures detected

I have S3 files in the following path…
seou1
  • 446
  • 1
  • 5
  • 21
0
votes
1 answer

Conditional Partitioning of Dataframe

I have a 2-column dataframe such as: col1 | col2 ------------ a1 | b1 ------------ a2 | b1 ------------ a3 | b2 ------------ a1 | b2 ------------ a1 | b3 ------------ I partition this dataframe using a random number…
A.M.
  • 1,757
  • 5
  • 22
  • 41
0
votes
2 answers

Google BigQuery Resources exceeded during query execution. How to split large window frames with partition in SQL

I'm running out of memory with my query on Google BigQuery. I have to calculate multiple window functions like running sums over multiple different time frames. My data mainly consists of an id (string), a value (number), a type ('in' or 'out',…
smaica
  • 723
  • 2
  • 11
  • 26
0
votes
1 answer

SQL/BigQuery: return only the last occurence

I have a table about conversations and different actions for each of…
allamirope
  • 31
  • 4
0
votes
0 answers

Is there a way to use a custom partitionby in spark df writer?

I have a dataset with a milliseconds timestamp field, like create_time of bigint type. And I want to write them into different hdfs directories with a formatted partition suffix like xxx/pt=20220101, xxx/pt=20220102 ... It looks like that…
doki
  • 85
  • 5
0
votes
1 answer

combine average function and row number over partition in one query MySQL

This is the database: id_user nama id_sikap kelancaran tahsin 10000000 Budi B 90 90 10000000 Budi A 89 78 10000000 Budi A 90 90 10000001 Ina A 84 44 I want to combine some conditions in one query such as average (tahsin,…
NJL
  • 1
  • 1
0
votes
0 answers

Oracle JOIN VS PARTITION BY

I maintain a table in Oracle that contains several hundred thousand lines of code, including a priority column, which indicates for each line its importance according to the needs of the…
Zar Tel
  • 19
  • 4
0
votes
1 answer

Update table with row_counter values

I feel like this should be easy but I'm having the hardest time. I'm using SQL server. I'm trying to get the output from the following query to be used in an update statement to add to my table: SELECT cons_id, credited_date, ROW_NUMBER() OVER…
jlvolsch
  • 1
  • 1
0
votes
1 answer

Find first event occurring after given event

I am working with a table consisting of a number of web sessions with various events and event id:s. To simplify my question, let's say that I have 4 columns which are session_id, event_name and event_id, where the event id can be used to order the…
0
votes
1 answer

Select min BeginDate from duplicates

select [BK_Vendedor] ,[NIF] ,[BeginDate] ,[EndDate] from ( select [BK_Vendedor] ,[NIF] ,[BeginDate] ,[EndDate], count(*) over (partition by [BK_Vendedor]) as dc from…
rafamaniac
  • 57
  • 7
0
votes
2 answers

I want to assign rank following with some condition

I want to assign the rank to 'drug_name' as per the order of 'svcdate' for each 'patient_id' in a dataset. (here, to describe the issue I'm only showing one patient_id in the image) select patient_id ,svcdate ,drug_name ,dense_rank()…
0
votes
2 answers

I want to assign the rank to a particular column as per the order of other column for each distinct id

I want to rank 'drug_name' as per the order of 'svcdate' for each 'patient_id'. I have attached sample desired output in the image, To do so I've tried using the following query, select *, dense_rank() over(partition by PATIENT_ID,drug_name order…