0

Problem :

  • Current implementation has only one metric(avg) based on which it shows results.
  • Required implementation needs to accommodate other metric's as well. However, I can't just place them in CASE statements as the queries needs to be sorted in different ways.
  • So, I have two options:
    1. Write different queries for each metric(Though only Query2 of the below structure would contain the only change)
    2. Write the queries for each metric as case statements in Query2 - I am going with this one as I think this is more maintainable.

The structure of what I am trying to do is something like this

--Query1 // that returns some fields based on a timerange(ex : Year1 - Year2 )
--Query2 // I need to manipulate this query to output records based on an input 
         // metric
--Query3 // That joins output from Query1 and Query2

**Existing query:**
WITH const as (
    select  
        /* Constants */
        'Formula' as *costfunction* -- Formula is a string which can take 
                                    -- the below formulae mentioned above  
                                    -- (Formula1 / Formula2/ etc)
    ),
    stats_analysis (
      /* fields to return */ 
    )AS(
      /* Main select query for Query1 */
    ),

--Query2 // Basically extracts out top 5 students based on average

top_students
(
stud_id,
metric_value,
metric_name
) AS (
SELECT
stud_id
, /*Calculation for metric*/AS metric_value
, 'marks' AS metric_name
FROM stats_analysis
GROUP BY stud_id
ORDER BY metric_value DESC
limit 5
)

--Query 3
Uses Query1 and Query2 to display final result

Attempted implementation :Basically I am trying to case the queries that needs to be executed based on different metric's(costFunction).

  • Other Metric's are : Formula1 = ax+b-c; Formula2 = c/x+1,etc;
  • P.S : I have marked *** *** in the code to signify why I need different queries

    top_students
    (
       stud_id,
       metric_value,
       metric_name
    )
    AS (
        SELECT CASE const.costFunction
           When 'Formula1' THEN
           (
               stud_id,
               metric_value,
               metric_name
           ) AS (
           SELECT
            stud_id
            , /***Calculation for Formula1***/ AS metric_value
            , 'marks' AS metric_name
            FROM stats_analysis CROSS JOIN const
            GROUP BY stud_id
            ORDER BY metric_value ***DESC***
            limit 5
           )
           When 'Formula2' THEN
           (
               stud_id,
               metric_value,
               metric_name
           ) AS (
           SELECT
            stud_id
            , /***Calculation for Formula2***/ AS metric_value
            , 'marks' AS metric_name
            FROM stats_analysis CROSS JOIN const
            GROUP BY stud_id
            ORDER BY metric_value ***ASC***
            limit 5
           )
           When 'Formula3' THEN
           (
               stud_id,
               metric_value,
               metric_name
           ) AS (
           SELECT
            stud_id
            , /***Calculation for Formula3***/ AS metric_value
            , 'marks' AS metric_name
            FROM stats_analysis CROSS JOIN const
            GROUP BY stud_id
            ORDER BY metric_value ***DESC***
            limit 5
           )
    )
    

This throws a syntax error with the AS inside the CASE. I am starting out new in PG, so I am open to any better approach to structure this query as well. Thanks !

Edit :

Sample Data

VideoID | StartTime         | EndTime            |Views|TotalTime  |MinTime        |    MaxTime
17276   |2018-09-26 20:33:43| 2018-09-26 20:48:43|  90 |554.2757137|    1.104655658|    25.59161658
17276   |2018-09-26 20:48:43| 2018-09-26 21:03:43|  418|3160.102025|    0.973088008|    167.0388009
17276   |2018-09-26 21:18:44| 2018-09-26 21:33:44|  14 |112.5031557|    0.997863734|    29.2182703
29083   |2018-09-26 20:48:43| 2018-09-26 21:03:43|  419|3552.922446| 0.964971822   |    152.9819936
29083   |2018-09-26 20:33:43| 2018-09-26 20:48:43|  90 |541.1001533|    1.316958002|    27.36436251
29083   |2018-09-26 21:33:44| 2018-09-26 21:48:44|  314|758.0945074|    0.013669366|    1.663391002
29083   |2018-09-26 21:33:44| 2018-09-26 21:48:44|  450|3029.140526|    0.969670667|    139.6291586

Expected Output : will show top N records based on the Aggregation Type,grouped by VideoId and sorted in the order as shown in the examples. Parameters provided : Number of Records(int), Aggregation Type(string)

Ex1 : Input = (2,avg) 

VideoId   | MetricValue
17276       7.33 // Calculated by Sum(Total Time)/Sum(Views)
29083       6.19
Explanation : top 2 by average would mean top 2 with highest avg. i.e:DESC 
Ex2 : Input = (1,max)

VideoId   | MetricValue
29083       1.31    // Calculated by Max(MaxTime) after grouping by ID
Explanation : top 1 by max would mean top 1 with highest MaxTime. i.e:DESC 

Ex3 : Input = (1,min)

VideoId   | MetricValue
29083       0.013669366 // Calculated by Min(MinTime) after grouping by ID
Explanation : top 1 by min would mean top 1 with lowest MinTime. i.e:ASC 
user5566364
  • 173
  • 2
  • 12
  • 2
    A `case` expression returns only one value, with the name given in the alias after the `end`. – Gordon Linoff Oct 19 '18 at 22:46
  • I see. Thanks. In that case, is there an alternative way of implementing this without writing different queries for each metric? – user5566364 Oct 19 '18 at 22:53
  • 2
    That is really hard to say. Without sample data and desired results, I find it really hard to figure out what you are trying to do. – Gordon Linoff Oct 19 '18 at 23:04
  • 2
    Seriously. How do you expect anyone to follow your pseudo code? Forget about that. Just provide **sample data** and **expected result**, if you have real code that works, then provide that as well. You seem to need much more than a `case expression` by the way. – Paul Maxwell Oct 20 '18 at 02:26
  • I have tried down to scale down the requirement to a similar smaller problem and provided the dataset and the expected output. Let me know if it helps now. I would like to highlight it again - The current implementation was just to aggregate by avg , so there was not much need for the case statement. The new requirement is to have the other aggregation types and as a result, I was trying to leverage the existing code. However, notice how the sorting order for min v/s the rest differs as a result of which just case-ing by aggregation types does not seem like a good idea. – user5566364 Oct 22 '18 at 19:20
  • 1) generate a query for only the metric that you want. or 2) let the DBMS generate all the metrics,and let the frontend pick one. or 3) use dynamic SQL to generate&execute exactly the query&metric that you want. – wildplasser Oct 22 '18 at 20:19

0 Answers0