2

I would like to discuss the "best" way to storage date periods in a database. Let's talk about SQL/MySQL, but this question may be for any database. I have the sensation I am doing something wrong for years...

In english, the information I have is:

-In year 2014, value is 1000
-In year 2015, value is 2000
-In year 2016, there is no value
-In year 2017 (and go on), value is 3000

Someone may store as:

BeginDate   EndDate     Value
2014-01-01  2014-12-31  1000
2015-01-01  2015-12-31  2000
2017-01-01  NULL        3000

Others may store as:

Date        Value
2014-01-01  1000
2015-01-01  2000
2016-01-01  NULL
2017-01-01  3000
  • First method validation rules looks like mayhem to develop in order to avoid holes and overlaps.
  • In second method the problem seem to filter one punctual date inside a period.

What my colleagues prefer? Any other suggestion?

EDIT: I used full year only for example, my data usually change with day granularity.

EDIT 2: I thought about using stored "Date" as "BeginDate", order rows by Date, then select the "EndDate" in next (or previous) row. Storing "BeginDate" and "Interval" would lead to hole/overlap problem as method one, that I need a complex validation rule to avoid.

TNT
  • 819
  • 1
  • 8
  • 28

1 Answers1

0

It mostly depends on the way you will be using this information - I'm assuming you do more than just store values for a year in your database.

Lots of guesses here, but I guess you have other tables with time-bounded data, and that you need to compare the dates to find matches.

For instance, in your current schema:

select *
from other_table ot
inner join year_table yt on ot.transaction_date between yt.year_start and yt.year_end

That should be an easy query to optimize - it's a straight data comparison, and if the table is big enough, you can add indexes to speed it up.

In your second schema suggestion, it's not as easy:

select *
from other_table ot
inner join year_table yt 
 on ot.transaction_date between yt.year_start 
 and yt.year_start + INTERVAL 1 YEAR

Crucially - this is harder to optimize, as every comparison needs to execute a scalar function. It might not matter - but with a large table, or a more complex query, it could be a bottleneck.

You can also store the year as an integer (as some of the commenters recommend).

select *
from other_table ot
inner join year_table yt on year(ot.transaction_date) = yt.year

Again - this is likely to have a performance impact, as every comparison requires a function to execute.

The purist in me doesn't like to store this as an integer - so you could also use MySQL's YEAR datatype.

So, assuming data size isn't an issue you're optimizing for, the solution really would lie in the way your data in this table relates to the rest of your schema.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • I thought in second method to not store a interval (days, years, or another timespan) to avoid hole/overlap problem. Maybe you could order rows by Date, then select the "EndDate" in next (or previous) row. I know, this doesn't seem efficient either. – TNT Jan 16 '17 at 14:24