3

I have been using ColdFusion 2016 and ZingCharts (bundled) to dynamically create charts using SQL Server, with a time series on the X axis. When there are time gaps I would like the line chart to also show a gap, but instead the line is continuous and plots each datapoint consecutively.

A pic of the chart the way it is plotting now, you can see there is no 'gap' between the Oct 29 and March dates, the data just run together: NoDataGap

My data are generally in 15min increments, but there are stretches of time (days or months) where there are gaps in the timeseries and data. I contacted ZingCharts to ask if there was some kind of style tag that controls whether the dates are displayed consecutively or with gaps and there is not. It's something that has to be manipulated at the data-level. If my data were hardcoded I would have to add null values so that the charts would plot with gaps in the timeseries, but my charts are dynamic (a user can choose any number of 7 parameters to add to the chart for a date range they choose). I have found information on how to solve this for hardcoded data, but I'm looking for ideas for solutions for dynamically loaded data/series. I have also found information on a deprecated coldfusion tag for the XML file, isInterpolated="false", but that's no longer an option.

My question is what is the best way to solve this? I found some information about creating a calendar table in SQL Server and unioning that with the table(s) providing the data so that all datetimes would be filled. I was wondering if there's another approach that I'm not thinking of? Thanks for any help, I'm very new at all of this.


Update: Here is the current query for the data, which is a bit complicated. It pulls "Nth" rows based on how many parameters (7 available) are selected and how many days are in the date range:

SELECT
distinct 
 datepart(year, t.sample_date) as [year]
,datepart(month, t.sample_date) as [month]
,datepart(day, t.sample_date) as [day]
,datepart(hour, t.sample_time) as [hr]
,datepart(minute, t.sample_time) as [min]  
,convert(varchar(10), t.sample_date, 1) + ' ' + 
  RIGHT('0' + CONVERT([varchar](2), DATEPART(HOUR, t.sample_time)), 2) + ':' +
  RIGHT('0' + CONVERT([varchar](2), DATEPART(MINUTE, t.sample_time)), 2) AS [datetime] 
,t.stationdesc
<cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>,ROUND(t.salinity,1) as salinity</cfif>
<!---plus 6 more parameters--->
FROM (
SELECT    
    [sample_date]
    ,sample_time
    ,stationdesc
    <cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity") >,salinity</cfif>
    <!---plus 6 more parameters--->
    , row_number() OVER (ORDER BY streamcode) AS rownum
    FROM MyUnionizedTables
    WHERE stationdesc = (<cfqueryparam value="#form.station#" cfsqltype="cf_sql_varchar">)
    AND [sample_date] BETWEEN (<cfqueryparam value='#Form.StartDate#' cfsqltype="cf_sql_date">) 
    AND (<cfqueryparam value='#Form.EndDate#' cfsqltype="cf_sql_date">)
    <cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>and salinity > -25 and salinity <40 and salinity is not NULL  </cfif>
    <!---plus 6 more parameters--->                           
    GROUP BY sample_date, sample_time, stationdesc, streamcode 
    <cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>,salinity</cfif>
    <!---plus 6 more parameters--->
    ) AS t
WHERE    <!---returning Nth row when record sets (count of days between dates selected) are long--->
    <cfif IsDefined("form.station") AND IsDefined("form.parameter") AND #ParamCount# LTE 3 AND form.station eq 'Coastal Bays - Public Landing' and #ctdays# gte 10> t.rownum % 64 = 0 
    <cfelseif IsDefined("form.parameter") AND #ParamCount# LTE 3 AND #ctDays# gte '5840'> t.rownum % 64 = 0 
        <!---plus lots more elseifs--->
    <cfelseif  IsDefined("form.parameter") AND #ParamCount# GTE 7  AND  #ctDays# gte '350'> t.rownum % 8 = 0
    <cfelse>t.rownum % 1 = 0</cfif>
ORDER BY 
     datepart(year, t.sample_date) 
    ,datepart(month, t.sample_date) 
    ,datepart(day, t.sample_date) 
    ,datepart(hour, t.sample_time) 
    ,datepart(minute, t.sample_time) 

SECOND UPDATE (after Leigh's link to query on GitHub):

So I'd actually been working on a similar query to the one Leigh posted based on the "CTE Expression" section here. I switched to trying to work with her version, which is below. I don't have write edits, so I'm working with an existing table. MyDataTable has ~ 21mil rows, with a separate sample_date(datetime) and sample_time(datetime) [the dates and times are a PITA - b/c of the instruments and the way these data are remotely telemetered we get a datetime column with a 'good date' but a bogus timevalue that we call 'sample_date', and then a separate datetime column called 'sample_time' with a bogus date and a 'good time'.] There are 125 stations, each with data (for example, temperature) from different starting and ending dates/times, beginning in 2001 through present. So I need to fill date/time gaps for 125 different stations with differing gaps of time, that are normally in 15min increments.

--- simulate main table(s)
--CREATE TABLE MyDataTable ( sample_date datetime, sample_time datetime, stationdesc nvarchar, wtemp float)

--- generate all dates within this range
DECLARE @startDate datetime
DECLARE @maxDate datetime
SET @startDate = '2015-01-01'
SET @maxDate = '2016-12-31'

--- get MISSING dates
;WITH missingDates AS
(  
    SELECT DATEADD(day,1,@startDate) AS TheDate
    UNION ALL  
    SELECT  DATEADD(day,1, TheDate) 
    FROM    missingDates  
    WHERE   TheDate < @maxDate  
)
SELECT *
      --[wtemp]
   --  ,[stationdesc]
   --  ,[TIMEVALUE]
FROM   missingDates mi LEFT JOIN MyDataTable t ON t.sample_date = mi.TheDate
WHERE  t.sample_date IS NULL
--and stationdesc = 'Back River - Lynch Point'
--ORDER BY timevalue
OPTION  (MAXRECURSION 0)

When I run this query as-is I get only 17 rows of data. TheDate column lists datetimes with dates 12/15-12/31/16 and all times are 00:00:00.000. Query takes 49s.
enter image description here


Meanwhile, my coworker and I have been working on alternate methods.

--Putting data from only 1 station from our big datatable into the new testtable called '_testdatatable'

SELECT        station, sample_date, sample_time, wtemp, streamcode, stationdesc, TIMEVALUE
INTO              _testdatatable
FROM            MyBigDataTable
WHERE        (stationdesc = 'Back River')
order by [sample_date],[sample_time]

--Next, make a new table [_testdatatableGap] with all time values in 15min increments from a datetime table we made
SELECT [wtemp]=null
      ,[streamcode]='ABC1234'
      ,[stationdesc]= 'Back River'
      ,[TIMEVALUE]
      into [tide].[dbo].[_testdatatableGap]
  FROM DateTimeTable
  WHERE  (TIMEVALUE BETWEEN '4/19/2014' AND getdate())

--Then, get the missing dates from the Gap table and put into the testdatatable
INSERT into [_testdatatable]
      (  [wtemp]
        ,[streamcode]
        ,[stationdesc]
        ,[TIMEVALUE] 
)
    (SELECT 
       [wtemp]=null -- needs this for except to work
      ,
      [streamcode]
      ,[stationdesc]
      ,
      [TIMEVALUE] 
  FROM [_testdatatableGap]   
EXCEPT   
SELECT 
       [wtemp]=null -- needs this for except to work
      ,
    [streamcode]
      ,[stationdesc]
      ,
      [TIMEVALUE] 
  FROM [_testdatatable])

This method worked to create a table with all the 15min increments in date/time, which resulted in a correctly drawn chart (below). However, we don't know how to scale this up to the full 125 station full data table without making multiple tables.

CorrectDataGaps

Becalecca
  • 203
  • 1
  • 12
  • It really requires generating `null` values for all the missing points? :/ How much data do you normally chart? What does the base query look like? – Leigh Dec 07 '16 at 19:21
  • heh, well it's 96 datapoints per day * 365 days * max of 16 years so far, and they can load between 1 and 7 parameters - so I set up a query that pulls 'Nth' rows, with the 'N' dependent on how many parameters they've chosen and how many days in their date selection. I'll add the query to the original question b/c it's lengthy. – Becalecca Dec 07 '16 at 19:35
  • 1
    (Edit) Hm... wondering if you could get away with just a single null per (day?) or if you really need to generate one per missing element. I am sure the missing elements could be generated in SQL, it just a question of how and how efficiently (ie calendar table or cte). – Leigh Dec 07 '16 at 19:47
  • My coworker just discovered cte just now and made a table pretty quickly...would I just union that with my datatable(s)? But yes, that's something very interesting to think about, I could try it with only 1 null per day. – Becalecca Dec 07 '16 at 19:52
  • 1
    And yes, the literal answer was "If you want to break the line you need to but a null in there. [1,2,3,null,5,6,7]". I'm shocked there's not a better solution. – Becalecca Dec 07 '16 at 19:56
  • 1
    Normally I use that kind of CTE in an outer JOIN with another table(s). So the results always include the necessary dates. So *conceptually*, the result is something like this `SELECT cte.SomeDate, ot.SomeValue FROM cte LEFT JOIN otherTable ot ON ot.SomeDate = cte.SomeDate` , ie The date is always populated and the values end up as `null` for missing dates. – Leigh Dec 07 '16 at 19:58
  • Thanks, this gives me something to work with, I'll report back. – Becalecca Dec 07 '16 at 20:09
  • 1
    (Edit) Okay, if you want to try the "single null per day" a UNION with this type of [CTE](http://stackoverflow.com/questions/14626473/cte-to-get-dates-between-two-dates-using-sql-server) would probably work better. See example https://gist.github.com/anonymous/14346ecb3f502cb3d6653a230b73ee0c . Just UNION the result with the existing dates. – Leigh Dec 07 '16 at 20:13
  • 1
    Calendar tables are very useful but will not completely solve your problem. You need to plot values at the datetime level. If you see other potential uses for a calendar table (fiscal year reporting for example), build and maintain one. Keep it separate from the time of day problem. Solve that one with another table, table function, or subquery. – Dan Bracuk Dec 08 '16 at 11:56
  • I'm still working on this, but not having much luck because I need to have some kind of 'station'/'station id' in there as well. I successfully joined 3 datatables with a 4th datetime table. But since the datetime table (in 15min increments) has null values for all the other parameters when I pull the data based on station and daterange it doesn't fill in the missing datetimes. Short of creating a datetime table for all 120+ stations, I don't know what to do. I feel like I'm missing something... – Becalecca Dec 09 '16 at 14:35
  • 1
    @Becalecca - (Did not see your last comment. With a few exceptions, SO only notifies participants of new comments if you use '@'+ username.). The issue makes sense conceptually, but not sure I am following the new JOIN's. An OUTER JOIN (or possibly UNION ALL) should work here. Any way you can dummy up a small example [like this one](https://gist.github.com/anonymous/14346ecb3f502cb3d6653a230b73ee0c) that demonstrates the problem? – Leigh Dec 13 '16 at 14:09
  • @Leigh, thanks. I added an update to my question - I tried your query and don't seem to be getting the right result. My coworker made a test table using a different method which seems to work (but not sure how to scale up to multiple stations without separate tables), and when I chart the data it puts correct data gaps with correct spacing based on missing datetimes. I tried charting data with only a single value for a missing date and it does provide a gap, but does not space the data correctly. – Becalecca Dec 14 '16 at 15:17
  • 1
    *don't seem to be getting the right result.* Yes, not sure if you saw [my comment](http://stackoverflow.com/questions/41024517/make-chart-gaps-in-zingchart-when-missing-dates-in-dynamically-loaded-data?noredirect=1#comment69260289_41024517), but it is because that specific query was designed to be used differently. It only returns missing dates. So it would need to be combined with a UNION/ALL (to include the existing dates) rather than a JOIN. Also, it only returns a single date, which you said did not generate the correct spacing anyway, so we can probably scratch that idea ;-) – Leigh Dec 14 '16 at 16:25
  • 1
    @Becalecca - Here is [an example using an OUTER JOIN](https://gist.github.com/anonymous/4b082589baf6ed86053c78ed46d2bd10). The demo uses CTE's to generate the dates and times, but longer term you might want to make those permanent tables instead. I'd also suggest taking a look at the execution plans to determine which offers better performance: a single calendar table (both date and time) or separate tables. – Leigh Dec 14 '16 at 17:38
  • @Becalecca - How did things work out with the SQL? :) I was thinking given the large amount of data involved, perm calendar tables would offer better performance than the POC with CTE's, but have not tested it. – Leigh Dec 19 '16 at 17:16
  • @Leigh I've been working on it, I'll have a big update soon on results from your query as well as another method (using a #temptable) that is working well. But things got a bit more complicated when I realized that I've been shooting myself in the foot with adding in an Nth rows part to the query. I was refilling the rows I removed on purpose with null data...so working on 'refining' my approach. – Becalecca Dec 20 '16 at 20:19
  • @Becalecca - Oooh ... lol. Yeah, did not think about that, but makes total sense. The nth row filters need to be applied after union-ing the data together. – Leigh Dec 20 '16 at 21:18
  • @Leigh - Yes! But when I do that and chart the data it's now putting breaks in between every single 'datetime' gap in the data, which is NOT what it was doing before. Trying to figure out what's going on! – Becalecca Dec 21 '16 at 13:40
  • One thing I was confused about was the [timevalue] column. It sounds like it contains both a date and time, but ... the filters use a date only, ie '5/29/2014' . – Leigh Dec 21 '16 at 19:11
  • @Leigh - Yes, users choose a date range by date only from the [timevalue] column, I was worried about that but it seems to work fine. – Becalecca Dec 22 '16 at 12:28
  • 1
    @Becalecca - So that column contains a date *only*? If so, then never mind :) However, the reason I ask is that if `[timevalue]` contained both a date and time ie 05/29/2014 17:15:00, then filtering on just a date, ie '05/29/2014' would end up excluding most of the records for that day. I typically use the approach [at the end of this thread](http://stackoverflow.com/questions/17575362/date-format-and-sql-query-clarification/17577274#17577274), instead of `between`, because it works with both "date only" and "date and time" columns. – Leigh Dec 22 '16 at 15:41
  • @Leigh - That is super helpful, thank you! I will switch to using that approach because you are correct in that [timevalue] is a datetime column and I was only filtering on date. Thanks! – Becalecca Dec 22 '16 at 16:34
  • 1
    @Becalecca - Awesome, and thanks for posting the resolution. This turned out to be one of the more interesting charting questions I have come across in a while :) Cheers. – Leigh Dec 22 '16 at 18:27

1 Answers1

1

After working through several suggestions, and a lot of research, trial and error I think I’ve solved my problem. I need to work on my additional complication of sometimes needing to reduce the volume of data returned and graphed, but that part is sort of outside the realm of my original question.

The short version of my answer is:

  1. Made a table view of MyBigDataTable with an additional column which is a datetime column called “TIMEVALUE”.

  2. Made a big permanent datetime calendar table with the datetime column called the same: “TIMEVALUE”.

  3. I then developed a set of SQL queries that

(a) gather data from MyBigDataTable and put it into a #temptable, and

(b) also gathers datetimes from the calendar table and puts it into the same #temptable.

Then, (c) because now there will sometimes be 2 datetime rows, one with data and one with nulls, I run a query to only keep the row with data if there are 2 rows of matching datetime and station. This data can then be charted.

  1. This is all now written dynamically in my .cfm page, station, date range and parameters are chosen by a user and a chart is now successfully drawn with correct ‘gaps’ in the datetimes for times of missing data.

Here’s SQL (here, limited to only 1 parameter, but I have 8):

--Step 1. Check if the temptable exists, if it does then delete it
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END
;
--Step 2. Create the temptable with data from the parameters, station and dates selected on the .cfm 
SET NOCOUNT ON

SELECT 
     timevalue
    ,stationdesc
    ,wtemp
INTO #TempTable

FROM MyBigDataTable
WHERE 
    stationdesc = 'Station01'
    and [timevalue] BETWEEN '5/29/2014' AND '10/01/2016'
GROUP BY 
    TIMEVALUE
    ,stationdesc
    ,wtemp
;
--Step 3. Now select datetimes from a big calendar table, and set stationdesc to the selected station, 
--and rest of parameters to null. And do this for the same selected date range
INSERT INTO #TempTable
SELECT 
[TIMEVALUE] 
,[stationdesc]= 'Station01' 
,wtemp=null
FROM MyDatetimeCalendarTable
WHERE  [timevalue] BETWEEN '5/29/2014' AND '10/01/2016'
;
--Step 4. Run query on the temptable to gather data for chart, but b/c sometimes there will be 2 rows with the same datetime and station but one with data and one with nulls, this query only gathers the row with data if there are 2 rows with matching datetime and station
SELECT distinct *
FROM #TempTable a
WHERE 
wtemp is not null or
    wtemp is null and 
    not exists(
        SELECT * FROM #TempTable b
        WHERE a.timevalue=b.timevalue 
and a.stationdesc=b.stationdesc and b.wtemp is not null)
ORDER BY timevalue
;

I need to fully test it and make some amendments, but I think this satisfies the requirements of an answer, because so far it's doing what I need it to do. Thank you to @Leigh and @Dan Bracuk for their wisdom (and patience!)

Becalecca
  • 203
  • 1
  • 12