2

Handling Missing Data, Incomplete data is an issue faced by all Data Miners/ Developers (depending on job function) in the industry.

If I want to convert hourly data (24 records) to daily (1 record), what are the best practices to achieve this.

  1. use having count(*) = 24 in group by clause. So how/when to import those records which were < 24 and if they miss permanently.
  2. Don't use having count(*) = 24 clause, so all the data even if it is < 24 will be imported. So how to re-insert those values later on when they become 24.

Note I have primary key on date/time fields so I need to delete those incomplete records first before re-inserting (which I don't want to do).

Sample data

Hourly Table: PK on Time, ElementNum, RouteName

Time            ElementNum  RouteName   NorthTraffic    SouthTraffic
15-3-14 0:00    Element1    Far Yards   12.3            14.5
15-3-14 1:00    Element1    Far Yards   10              11
15-3-14 2:00    Element1    Far Yards   9               10
15-3-14 3:00    Element1    Far Yards   8.5             9.1
…   ..  ..  ..  ..
15-3-14 21:00   Element1    Far Yards   25              26.2
15-3-14 22:00   Element1    Far Yards   29.3            33.7
15-3-14 23:00   Element1    Far Yards   25.6            27.8
15-3-14 0:00    Element1    JJ Park     80.1            98.8
15-3-14 1:00    Element1    JJ Park     75.3            86
15-3-14 2:00    Element1    JJ Park     70.9            71.1
15-3-14 3:00    Element1    JJ Park     66.5            67.4
…   …   …   …   …
15-3-14 21:00   Element1    JJ Park     112.1           115.5
15-3-14 22:00   Element1    JJ Park     125.4           130.7
15-3-14 23:00   Element1    JJ Park     120             121.3

SQL query to convert Hourly to Daily; executes everyday at 0700 hrs:

SELECT 
    convert(varchar(10), Time, 120),
    ElementNum, RouteName, 
    MAX(NorthTraffic), MAX(SouthTraffic), 
    MAX(NorthTraffic) + MAX(SouthTraffic) TotalTrafficMAX 
FROM  
    HourlyTable 
WHERE 
    Time = '15/03/2014' 
GROUP BY  
    convert(varchar(10), Time, 120), ElementNum, RouteName 
HAVING
    count(*) = 24

Daily Table PK on Date, ElementNum, RouteName

Date    ElementNum  RouteName   NorthTrafficMAX SouthTrafficMAX TotalTrafficMAX
15-3-14 Element1    Far Yards   29.3    33.7    63
15-3-14 Element1    JJ Park     125.4   130.7   256.1

Problem

If the data isn't populated in Hourly Tables before 0700hrs (which is quite common) the daily table will not be populated.

My workaround

  1. Don't add having clause, but the data in daily table will be inaccurate. Then as a 2nd step write stored procedures which will run every hour and updates the MAX column of daily table. This way is too cumbersome. Though I have done this kind of update in many other projects, but here I feel I should have some professional approach and above all next to those columns of MAXTraffic there are many more columns where I have to apply formulas, calculate utilizations, percentage etc. Writing update for each one of them will be a huge challenge.

Need Solution

Is there a real world professional solution for this kind of situations.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

0

I have changed where clause of your query, try to execute this script after midnight 12 AM, it will get records for previous day, then you can insert same in any other table.

SELECT convert(varchar(10),Time,120) ,ElementNum ,RouteName 
,MAX(NorthTraffic) 
,MAX(SouthTraffic) 
,MAX(NorthTraffic)+MAX(SouthTraffic) TotalTrafficMAX 
from HourlyTable where DATEDIFF(dd,Time,getdate()) = 1
group by convert(varchar(10),Time,120) ,ElementNum ,RouteName 
having count(*) = 24

Hope I have understood your problem correctly!

AK47
  • 3,707
  • 3
  • 17
  • 36