5

I have a SQL table formatted like the following:

8/1/2012   0.111
8/2/2012   0.222
8/5/2012   0.333
.
.
.

I want to run a query that will get the average value in the second column over a specified date range. For dates that do not exist (8/3/2012 and 8/4/2012 in the above table), I want to use the last known value (0.222) for the missing date. How can I go about doing this?

Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
Clint Perry
  • 51
  • 1
  • 2
  • 1
    What DBMS are you using? – mwigdahl Aug 17 '12 at 17:20
  • A pretty general approach is in this link on linear interpolation of missing values in SQL Server: http://stackoverflow.com/questions/3018112/sql-server-interpolate-missing-rows. Should work regardless of whether you have SQL Server or something else. In your case, you'll want to adjust the interpolation function so that you have 100% weight on the lower value and 0% on the higher one. – mwigdahl Aug 17 '12 at 17:35

1 Answers1

3

Edit: Sorry I think I miss understand the question the first time I think this actually works (Tested)

Things I did:

  1. Created a temp table to test (included so you can test)
  2. Created a Recursive Common Table Expression that creates a Table named Dates (so i can have all the dates from a given date to another)
  3. The query has a subquery that returns a table with all the dates in the range and for every date the max date in the table with the values that is before the given date
  4. This table is joined with the table with the values and then the Average is calculated

    CREATE TABLE #Table1 (date_column  DateTime, Value Decimal(5,4))
    INSERT INTO #Table1 VALUES ('20120804', 0.1234)
    INSERT INTO #Table1 VALUES ('20120808', 0.2222)
    INSERT INTO #Table1 VALUES ('20120809', 0.9876)
    INSERT INTO #Table1 VALUES ('20120812', 0.0505);
    
    WITH Dates(date_column , row )
    AS (SELECT Cast('20120804' as datetime) date_column, 0 as row
    UNION ALL
    SELECT DateAdd(Day, 1, date_column), row + 1
    FROM Dates
    WHERE date_column < '20120830'
    ) 
    SELECT AVG(Value) FROM (
    SELECT Dates.date_column, (SELECT  MAX(date_column)
    FROM #Table1 WHERE #Table1.date_column <= Dates.date_column) maxDateWithValue 
    FROM Dates
    ) AllDatesWithLastDateWithValue
    LEFT JOIN #Table1 ON AllDatesWithLastDateWithValue.maxDateWithValue = #Table1.date_column 
    WHERE AllDatesWithLastDateWithValue.date_column >= '20120804'
    AND AllDatesWithLastDateWithValue.date_column <= '20120815'
    

Hope this helps... Nice question...

saul672
  • 737
  • 5
  • 6
  • Thanks for the response. The only problem with the query is it breaks if there is more than 100 iterations (time span of more than 100 days). Is there a way around this? I am just above a noob when it comes to SQL, so I apologize for any stupid questions. – Clint Perry Aug 17 '12 at 17:11
  • Sorry the query had one extra line at the end, by default (I think) the Recursive CTE will be limited to 100 iterations (to avoid infinite loops)... to skip this add `OPTION (MAXRECURSION X)` at the end of the query using the CTE, X can be an integer between 1 and 32,767 (in your case about 90 years) – saul672 Aug 17 '12 at 17:27