1

I currently have a challenge with Snowflake, where I have a PRICE column like the bellow, the goal is to "scale" this values but keep the original sum intact, like, I do not need to respect the proportions on the scale, but lowest value should continue to be the lowest and so on. Also I guess this can be a SQL problem.

I tried running a script using WIDTH_BUCKET but there is no option to maintain the total sum.

`SELECT 
    sale_date, 
    price,
    WIDTH_BUCKET(price, 200000, 600000, 5) AS "SALES GROUP"
  FROM home_sales
  ORDER BY sale_date;`

`+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |           1 |
| 2014-02-01 | 20        |           2 |
| 2015-04-01 | 30        |           3 |
| 2016-04-01 | 10        |           1 |
| 2017-04-01 | 50        |           4 |
| 2018-04-01 | 60        |           5 |
+------------+-----------+-------------+`

The output(SALES GROUP column) I am really looking for is like this: By this example the total SUM of column price is 180.

`+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |          12 |
| 2014-02-01 | 20        |          24 |
| 2015-04-01 | 30        |          28 |
| 2016-04-01 | 10        |          12 |
| 2017-04-01 | 50        |          47 |
| 2018-04-01 | 60        |          57 |
+------------+-----------+-------------+`

If we sum the SALES GROUP column, it still has a total of 180. *OBS: I know this is not exactly scaling, but I am new to snowflake and did not find the most correct term.

  • Hi - it is not clear, to me, what the logic/rules are that you are using for calculating the sales group values that you have in your result. Why those values and not any of the many other values that would seem to work e.g. you could add 1 (or 2 or 3 or 4) to the 3 lowest prices and subtract that value from the 3 highest prices and get a result that seems to meet your requirements – NickW Jul 17 '23 at 11:06
  • It is like you described, but imagine I have a huge number of rows, I wanted to "randomize" this values but keep the total sum of it intact. – Paulo Masnik Jul 17 '23 at 11:59

1 Answers1

0

How about scaling up the natural log to match the sum?

select *, (ln(price) * sum(price) over() / sum(ln(price)) over())::int as scaled_price
from t;
Radagast
  • 5,102
  • 3
  • 12
  • 27
  • Note: I am assuming prices can't = 0 because ln(0) would be undefined in that case. Use a case/iff expression to handle that if applicable. – Radagast Jul 17 '23 at 14:40