I am having a complicated SQL problem in PostgreSQL.
Suppose I have a large table called 'selling_prices'. It contains around 19 million rows. I want to remove some duplicate rows and also update some data. Here is the table structure:
seq | customer_co_cd | item_sku | seliing_tanka_rate | updatedate |
---|---|---|---|---|
1 | 1414343 | sku001 | 0.4 | 2021-01-18 14:34:48 |
2 | 1414343 | sku001 | 0.4 | 2021-01-18 14:34:48 |
3 | 1414343 | sku001 | 0.4 | 2021-01-16 01:34:48 |
4 | 1512333 | sku002 | 0.2 | 2021-01-16 01:34:48 |
5 | 1512333 | sku002 | 0.5 | 2021-01-16 01:34:48 |
and so on....
Condition 1: If the customer_co_cd and item_sku and selling_tanka_rate is same update the latest updatedate to '2021/11/12' and delete the other data.
After the SQL table should be like: delete seq(2,3) and update seq 1
seq | customer_co_cd | item_sku | seliing_tanka_rate | updatedate |
---|---|---|---|---|
1 | 1414343 | sku001 | 0.4 | 2021-11-12 00:00:00 |
Condition 2: If the (customer_co_cd and item_sku) is same and selling_tanka_rate is different then get the data as group
customer_co_cd | item_sku | count |
---|---|---|
1512333 | sku002 | 2 |
I tried some query using group by but it is slow...
SELECT customer_co_cd, item_sku, COUNT(*)
FROM selling_prices
GROUP BY customer_co_cd,item_sku
HAVING COUNT(*) > 1
I don't know how to query the condition 1. Also what is the efficient way to get condition 2. Keep in mind that there are around 19 million data.
Should I create a script or is there a efficient query I can use.