3

i have postgresql db with a table t1 and i want to calculate a threshold. the threshold should be for example car 1 uses more fuel than 75 % of all cars, car2 uses more fuel than 50% of all cars, .... mathematically i understand what i want to do, but i dont know how to build the query

id | name | value | threshold
________________________

1  | car1 |  30   |  ...%
2  | car2 |  15   |  ..%
3  | car3 |   7   |
4  | car4 |   5   |

here is a sql fiddle http://sqlfiddle.com/#!15/1e914/1

UPDATE t1
SET threshold = 
    select count(value)
     from t1

where (value > [over each row]) and followed by *100/the overall count()

sorry for that bad try but i am kind of lost. also tried some aggregate functions.

fabvys
  • 413
  • 1
  • 5
  • 12
  • 1
    Have a look at `100 * PERCENT_RANK() OVER (ORDER BY value)` or `100 * CUME_DIST() OVER (ORDER BY value)` – dnoeth Apr 18 '16 at 16:59
  • @dnoeth Both of these do not work because the solution is relative to the most fuel-efficient car and not calculated over the entire population. – Patrick Apr 18 '16 at 17:10
  • 1
    @Patrick: If those functions don't work then your answer is not correct, too. Both are just slight variations of your `RANK/COUNT` :-) – dnoeth Apr 18 '16 at 17:21
  • @dnoeth Not true. I am using `(rank() - 1) / count()` which gives exactly the results in this question. The `- 1` may be slight but it is the reason why you can't use the built-in functions. – Patrick Apr 19 '16 at 03:49
  • @Patrick: Ok, based on the expected result both are not correct, but this was just a description based on four rows :-) That's why I wrote he should see if those functions fit his needs. `PERCENT_RANK` is based on `(RANK-1)/(COUNT-1)`, i.e. the number of rows with a value **less than** the current value and `CUME_DIST` is based on **less than or equal**. – dnoeth Apr 19 '16 at 06:59
  • at the end it is the same like percentile or am i wrong? ntile(100) OVER(ORDER BY value) – fabvys May 04 '16 at 10:25

3 Answers3

5

You can solve this quite elegantly with a window function:

UPDATE t1
SET threshold = sub.thr
FROM (
  SELECT id, 100. * (rank() OVER (ORDER BY value) - 1) / count(*) OVER () AS thr
  FROM t1) sub
WHERE t1.id = sub.id;

The rank() function gives the rank (starting from 1) in an ordered set, in this case over the column value, which is then divided by the total number of rows in the set. Note that count(*) OVER () calculates the total number of rows in th partition but it does not aggregate the rows like a regular count(*) would.

Patrick
  • 29,357
  • 6
  • 62
  • 90
  • thank you all for the help. at the sqlfiddle link above you can see the differences between rank(),percent_rank(),cume_dist(). i think i was looking for percent_rank. i changed the order, now i can say 33% of the cars use more fuel than car2. (by the way my real table is not about cars). one thing to add, window-functions dont work with UPDATE – fabvys Apr 18 '16 at 22:16
  • You can wrap the window functions in a sub-select and then it should work. See updated answer. Note that the use of `percent_rank()` gives a result that is different from what you state in your question. There the percentage is relative to the most fuel efficient car, while the function calculates against all rows. `(rank() - 1) / count(*)` gives exactly the result you were asking for. – Patrick Apr 19 '16 at 03:44
  • true, the (rank() ... is the right function, that works for my question. thanks again. – fabvys Apr 28 '16 at 09:31
1
WITH    q AS
        (
        SELECT  *,
                (RANK() OVER (ORDER BY value) - 1) * 100. / COUNT(*) OVER () nt
        FROM mytable
        )
UPDATE  mytable
SET     threshold = nt
FROM    q
WHERE   mytable.id = q.id
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
0

Do you want to calculate percentage of using fuel?

UPDATE t1 
SET threshold = 
    (select value * 100 / t2.sumValue
    from (select sum(value) sumValue from t1) t2
    )

Or if you want to calculate specific threshold, may you show your expected result?

Pham X. Bach
  • 5,284
  • 4
  • 28
  • 42