Selecting an appropriate period or max/minimum period when outside of the period set

Question

I'm trying to see if it's possible to efficiently select a period a given date belongs to.

Let's say I have a table

id<long>|period_start<date>|period_end<date>|period_number<int>

and lets say I want for every id the period that "2013-11-20" belongs to.

i.e. naively

 select id, period_number 
 from period_table 
 where '2013-11-20' >= period_start and '2013-11-20' < period_end

However, if my date is beyond any period_end or before any period_start, it won't find this id. In those cases I want the minimum (if before the first period_start) or the maximum (if after the last period_end).

Any thoughts if this can be done efficiently? I can obviously do multiple queries (i.e. select into the table as above and then do another query to figure out the min and max periods).

So for example

+--+------------+----------+-------------+
|id|period_start|period_end|period_number|
+--+------------+----------+-------------+
|1 |2011-01-01  |2011-12-31|1            |
|1 |2012-01-01  |2012-12-31|2            |
|1 |2013-01-01  |2013-12-31|3            |
+--+------------+----------+-------------+

If I want what period 2012-05-03 belongs to, my naive sql works and returns period #2 (1|2 as the row, id, period_number). However, if I want what period 2014-01-14 (or 2010-01-14) it can't place it as it's outside the table.

Therefore since "2014-01-14" is > 2013-12-31, I want it to return the row "1|3" if I chose 2010-01-14, I'd want it to return 1|1, as 2010-01-14 < 2011-01-01.

The point of this is that we have a index table that keeps track of different types of periods and what is their relative value (think quarter, half year, years) for many different things and they all don't line up to normal years. Sometimes we want to say we want period X (some integer) relative to date Y. If we can place Y within the table and figure out Y's period_number, we can easily do the math to figure out what to add/subtract to that value. If Y is outside the bounds of the table, we define Y to be the max/min of the table respectively.

Can you please post some sample data and the expected result? Also what RDBMS (SQL Flavor) are you using? — PM 77-1, Jan 15 '14 at 00:17
Please post the **desired** result based on your sample data. What **exactly** should the output be for `2010-01-14` and `2014-01-14`? — PM 77-1, Jan 15 '14 at 00:28
I think you should describe the overall problem your trying to solve. What exactly are you trying to do? Your query will not work for 2014 because simply the record is not there. If you insert it, end of problem. — Menelaos, Jan 15 '14 at 00:31
something that is rumbeling around inside my head is to handle this at data ingestion time and include a fake record that is dated from mindate->first record date and from last record date->maxdate. The Q then is, how to keep that in sync with live updates and make sure its accurate. — spotter, Jan 15 '14 at 01:22
Not assuming indexes at the moment, but everything is up for grabs in terms of that (though db server currently has a limit of one index column per table, supposedly changing) — spotter, Jan 15 '14 at 06:21
Also I should note that in the example above I only have a single id. In practice we have tens to hundreds of thousands of them. Part of me is pondering just keeping a cache of the min/max periods for each id and then trying to place the date within the table for each id, and if I can't, use what I have from my cached min/max records. I'm going to sleep on this and see if the answer below can help me clarify some points better — spotter, Jan 15 '14 at 06:25

Menelaos · Answer 1 · 2014-01-15T01:56:39.370

It seems that you want to print the first day, and last day of a given date's year.

So:

For 2012-05-03, print out 1|2012-01-01|2012-12-31|2
For 2014-01-14, print out 1|2014-01-01|2014-12-31|4

Solutions:

Create an insertion script that inserts all the rows for years up to 2100 (you can easily add more).
Instead of using a table, simply use Date Functions (or any other programming language).

Using SQL Date Functions

Take the date given, and print out all four needed columns using date functions.

Example (MySQL) - should work with INGRES with some modifications

Note: Should work in INGRES as I checked functions and they are the same:

http://community.actian.com/wiki/DATE_FORMAT_scalar_function

http://community.actian.com/wiki/Ingres_DBMS_function_list

   SELECT 1 AS id,
   CONCAT(DATE_FORMAT('2014-01-14', '%Y-'), '01-01') AS period_start,
   CONCAT(DATE_FORMAT('2014-01-14', '%Y-'), '12-31') AS period_end,
   DATE_FORMAT('2014-01-14', '%Y')-2010 AS period_number ;

SQLFiddle: http://sqlfiddle.com/#!2/d41d8/29204

UPDATE

A UNION should then work fine for you. Modify below as appropriate for INGRES.

SET @input =  '2015-01-14';

(SELECT id,
          period_number
   FROM period_table
   WHERE @input >= period_start
     AND @input < period_end)
UNION
  (SELECT id,
          period_number
   FROM period_table
   WHERE @input > (select MAX(period_end) FROM period_table) 
   order by period_end desc limit 1
  )
UNION
  (SELECT id,
          period_number
   FROM period_table
   WHERE @input < (select MIN(period_start) FROM period_table) 
   order by period_start asc limit 1
  )

SQLFiddle: http://sqlfiddle.com/#!2/643da/3

ALSO:

SET @input =  '2014-01-14';

SELECT id,
          period_number

   FROM period_table
   WHERE 
     (@input >= period_start
     AND @input < period_end)
     OR 
     (
       @input > (select MAX(period_end) FROM period_table) AND
       period_number= (select MAX(period_number) FROM period_table)
     )
      OR 
     (
       @input < (select MIN(period_start) FROM period_table) AND
       period_number= (select MIN(period_number) FROM period_table)
     )

sorry, not really what I want, the table that I showed, isn't the output, but the source data. i.e. what I'm searching against. The period_start/end columns don't have to be beg/end of year, but can be any random data (just so that they just define adjacent periods. — spotter, Jan 15 '14 at 01:12
yea, this is something that I've been thinking of, the union. not so sure how well it will perform in the context of what we have (reasonably large tables, millions of entries). — spotter, Jan 15 '14 at 15:05

score 0 · Answer 2 · answered Jan 15 '14 at 03:10

Note: I missed the database engine you were using, so I answered from the perspective of SQL Server. However, the query is pretty simple and you should be able to adapt it to your own needs.

The best I can come up with is, if your table is clustered (or at least indexed) on FromDate, a query that works in 2 seeks:

DECLARE @SearchDate datetime = '4062-05-04';

SELECT TOP 1 *
FROM
   (
      SELECT TOP 2
         Priority = 0,
         *
      FROM dbo.Period
      WHERE @SearchDate >= FromDate
      ORDER BY FromDate DESC
      UNION ALL
      SELECT TOP 1
         2,
         *
      FROM dbo.Period
      WHERE @SearchDate < FromDate
      ORDER BY FromDate
   ) X
ORDER BY Priority, FromDate DESC
;

See a Live Demo at SQL Fiddle

If you will post more information about your table structure and indexes, it's possible I can advise you better.

I also would like to suggest that if at all possible, you stop using inclusive end dates, where your ToDate column has the last day of the period in them such as '2013-12-31', and start using exclusive end dates, where the ToDate column has the beginning of the next period. The reason for this is usually only apparent after long database experience, but imagine what would happen if you suddenly had to add periods that were shorter than 1 day (such as shifts or even hours)--everything would break! But if you had used exclusive end dates all along, everything would work as is. Also, queries that have to merge periods together become much more complicated because you are adding 1 all over the place instead of doing simple equijoins such as WHERE P1.ToDate = P2.FromDate. I promise you that you will run an enormously greater chance of regretting using inclusive end dates than you will exclusive ones.

Clint Good · Answer 3 · 2014-01-15T05:43:03.937

I'm going to assume you're using SQL server as well for the purposes of the script below but the SQL is generic and should work with any (?) database as long as the column name delimiters are correct

declare @Period_Table table 
(
    [id] int not null,
    period_start datetime not null,
    period_end datetime not null,
    period_number int not null,
    primary key ([id], period_number)
)

insert into @Period_Table values (1, '2011-01-01', '2011-12-31', 1)
insert into @Period_Table values (1, '2012-01-01', '2012-12-31', 2)
insert into @Period_Table values (1, '2013-01-01', '2013-12-31', 3)

declare @TestDate datetime
declare @TestId int 
set @TestId = 1 -- yearly
set @TestDate = '2012-05-03'
set @TestDate = '2014-01-14'
set @TestDate = '2010-01-14'

select *
  from @Period_Table pt
  where pt.[id] = @TestId and
    (
      pt.period_start <= @TestDate or 
      @TestDate < pt.period_start and
      not exists
      (
        select 1
          from @Period_Table pt2
          where pt.[id] = pt2.[id] and pt2.period_start < pt.period_start
      )
    ) and
    (
      pt.period_end >= @TestDate or 
      @TestDate > pt.period_end and
      not exists
      (
        select 1
          from @Period_Table pt2
          where pt.[id] = pt2.[id] and pt2.period_end > pt.period_end
      )
    )

It wasn't specified at the time. As I mentioned the Select statement, with appropriate table names will work — Clint Good, Aug 13 '14 at 22:07

score -1 · Accepted Answer · answered Aug 13 '14 at 16:00

Why aren't you creating "boundary periods"? Choose arbitrary beginning_of_time and end_of_time dates e.g. 01/01/0001 and 31/12/9999 and insert a fake period. Your example period_table will become:

+--+------------+----------+-------------+
|id|period_start|period_end|period_number|
+--+------------+----------+-------------+
|1 |0001-01-01  |2010-12-31|1            |
|1 |2011-01-01  |2011-12-31|1            |
|1 |2012-01-01  |2012-12-31|2            |
|1 |2013-01-01  |2013-12-31|3            |
|1 |2014-01-01  |9999-12-31|3            |
+--+------------+----------+-------------+

In this case, any query will retrieve one and only one row, e.g:

select id, period_number from period_table 
where '2013-11-20' between period_start and period_end

+--+-------------+
|id|period_number|
+--+-------------+
|1 |2            |
+--+-------------+

select id, period_number from period_table 
where '2010-11-20' between period_start and period_end

+--+-------------+
|id|period_number|
+--+-------------+
|1 |1            |
+--+-------------+

select id, period_number from period_table 
where '2014-11-20' between period_start and period_end

+--+-------------+
|id|period_number|
+--+-------------+
|1 |3            |
+--+-------------+

that's actually what I ended up doing (well, something very similar). — spotter, Aug 13 '14 at 19:10
@spotter It is a classic approach. The only caveat is to manage correctly the periods, for example modifying one period might trigger additional changes to maintain the integrity of the "chain" — Adrian, Aug 14 '14 at 10:39