0

I have a query in which I want to return the average, the max and the min of a specific column. However, when I execute twice or more times, the results are different from each other - meaning every time I run the query on the same data set I get different average results.

Why is that?

Heres the code:

   WITH avr AS (
        SELECT  
         ticker_symb, 
         day_sum, 
         cusip, 
         clos_prc, 
         nclos_prc,
              case
                   when clos_prc is null and nclos_prc is not null 
                   then (nclos_prc - LAG( nclos_prc ignore nulls) OVER (Partition by  cusip  ORDER BY cusip asc)) 
                   when  clos_prc is not null and nclos_prc is null
                   then LEAD( nclos_prc ignore nulls) OVER (Partition by cusip  ORDER BY cusip asc)- LAG( naclos_prc ignore nulls) OVER (Partition by cusip  ORDER BY cusip)
                  else NULL
                  end diff
            from  DAILY_SUMMARY 
            where (cusip in (select distinct cusip from thistory where 
        td between to_date('1-JAN-2017') and to_date('10-JUN-2017'))))
            SELECT  ticker_symb, 
            day_sum, 
            cusip, 
            clos_prc, 
            nclos_prc,
            diff,

            AVG(diff) OVER() as avr,
            MAX(diff) OVER() as max_diff,
            MIN(diff) OVER() as min_diff ,

          FROM avr
          where day_sum >'1-JAN-2017'
          ORDER BY cusip;
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
ana
  • 21
  • 9
  • Maybe I'm misunderstanding but why would they be the same? I assume new data is being written to `daily_summary` between times you run the query. – SQLChao Jun 20 '17 at 19:53
  • no its the same data set – ana Jun 20 '17 at 20:02
  • I am running the query back to back. So there is no difference in the data set – ana Jun 20 '17 at 20:02
  • 3
    Is `cusip` unique? you're ordering on this in your partition clause. if it's not unique then your `partition by` statement is nondeterministic meaning you may get a different ordering each time you run. Try adding a second column to the `order by` statements to ensure uniqueness and so the same ordering will be returned each time. – unfinishedmonkey Jun 20 '17 at 20:26
  • I think you should be ordering by day_sum within the over clauses where you are looking for closing prices. It is not typical to partition by and order by the same column. – Paul Maxwell Jun 20 '17 at 22:35
  • This question is related to all your other questions [here](https://stackoverflow.com/questions/44652319/any-way-around-ora-64307-error) [here](https://stackoverflow.com/questions/44573964/why-query-is-taking-too-long) and [here](https://stackoverflow.com/questions/44551197/avg-function-is-returning-wrong-result). Why don't you take some time and prepare sample data and try to simplify your query as much as possible while it still shows the symptoms. Just prepare [mcve] before you ask. It will greatly improve your chance to get answer. – Marek Vitek Jun 21 '17 at 07:28

1 Answers1

0

I believe you should be ordering by day_sum within the over clauses where you are looking for absent closing prices. It is certainly not typical to partition by and order by the same column and most probably this is the cause of the inconsistency.

Without data to work with I have guessed the ordering seen below, but hopefully it is sufficient for you to trial/test.

WITH avr
AS (
 SELECT
    ticker_symb
  , day_sum
  , cusip
  , clos_prc
  , nclos_prc
  , CASE 
   WHEN clos_prc IS NULL AND nclos_prc IS NOT NULL
    THEN (nclos_prc 
       - LAG(nclos_prc ignore nulls) OVER ( PARTITION BY cusip ORDER BY day_sum DESC )
         )
   WHEN clos_prc IS NOT NULL AND nclos_prc IS NULL
    THEN LEAD(nclos_prc ignore nulls) OVER ( PARTITION BY cusip ORDER BY day_sum ASC ) 
       - LAG(naclos_prc ignore nulls) OVER ( PARTITION BY cusip ORDER BY day_sum DESC )
   ELSE NULL
   END diff
 FROM DAILY_SUMMARY
 WHERE (
   cusip IN (
    SELECT DISTINCT cusip
    FROM thistory
    WHERE td BETWEEN to_date('1-JAN-2017') AND to_date('10-JUN-2017')
    )
   )
 )
select ...
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51