SQL Sort data into age bins based on date of birth and timestamp of datapoint

Question

I'm new to sql, and stackoverflow, so please show me mercy. I have 3 tables (described below). I have searched a lot for this specific problem and have not found a solution. I'm a graduate student studying the behavior and psychology of kids as they develop over years. I do this by observing an individual kid's behavior for 10 minutes and recording what they're doing every minute. I do this for multiple kids of different ages, and I follow each kid many times over many years. So the data are mixed-longitudinal (the same individuals are represented by multiple data points over the years). I should also note that I'm currently using Access.

The 10 minute bout is called a "follow," and the activity at each minute is called a "scan." Over the years, I've entered this data into a single Excel sheet with date of follow, timestamp of scan, individual ID, activity, and some other things. I've since broken this up into two tables: a follow_id table, which includes information such as a unique follow ID number, date of each follow, and the ID of the individual followed. I then made a second table with every single scan, and each scan has it's associated follow_id number, which is the unique ID number for the follow that it's from. I hope that makes sense, I tried typing up the tables but I can't get the formatting right. Please forgive my n00bness.

What I need to do is sort out all the scans for, say, infants below 1 year of age, and then between 1-2 years of age, and so on. This has been difficult because nearly all of my subjects are represented in multiple age bins. In other words, I may have followed individual A at 9 months old, then again at 16 months old. So I made a birthday table, which has each individual's ID and their birthdays.

I've tried a bunch of subquerying and joins and whatnot but it's clear that I don't really know what I'm doing. I would really appreciate if someone could point me in the right direction. For instance, should I start with a join? Or could this all work with just subquerying? Any insight would help and be greatly appreciated.

tbl_biography

 +------------------------------+
 | individual_id |  birth_date  |
 +------------------------------+
 |      AA       |  2016-01-01  |
 |      BB       |  2013-01-01  |
 |      CC       |  2014-01-01  |
 +------------------------------+

tbl_follow_id

 +-------------------------------------------+
 | follow_id | individual_id  | follow_date  |
 +-------------------------------------------+
 |   0001    |      BB        | 2013-12-12   |
 |   0002    |      BB        | 2018-01-01   |
 |   0003    |      BB        | 2015-01-01   |
 |   0004    |      CC        | 2016-01-01   |
 |   0005    |      AA        | 2017-01-01   |
 +-------------------------------------------+

tbl_scan_id

  -follow_id: 0001; 0001; 0001; 0002; 0002; 0002; 0003; 0003; 0003; 0004; 0004; 0004; 0005; 0005; 0005

  -scan_id: 00001; 00002; 00003; 00004; 00005; 00006; 00007; 00008; 00009; 00010; 00011; 00012; 00013; 00014; 00015

  -timestamp: (I don’t think this really matters here, but each scan would have hh:mm)

  -scan: various behaviors such as REST, PLAY, EAT, etc.

I should point out that you can see that individuals AA and BB are represented in the dataset in 2 separate follows each, and they are different ages for each follow. The question is, how do I filter out, for example, all scans for 0-1 year olds, 1-2 year olds, etc? For now, it would be fine to pool all individuals together. But in the future it would be amazing to be able to do this by certain individuals, say based on sex, mother, and other variables that I also have in the biography table. Thanks everyone in advance.

It will be helpful if you can give an example of the query you have done. — Henriette Harmse, Feb 03 '18 at 04:01

Gustav · Answer 1 · 2018-02-04T08:48:31.623

First, you will need a proper function calculating the ages (see below) as DateDiff only returns the difference in calendar years.

Then I would create and save a query like this:

SELECT 
    Years([birth_date],[follow_date]) AS age, 
    tbl_biography.individual_id, 
    tbl_follow.follow_id
FROM 
    tbl_biography 
INNER JOIN 
    tbl_follow 
    ON tbl_biography.individual_id = tbl_follow.individual_id
ORDER BY 
    Years([birth_date],[follow_date]), 
    tbl_biography.individual_id, 
    tbl_follow.follow_id;

That will produce this output:

age individual_id   follow_id
0   BB              0001
1   AA              0005
2   BB              0003
2   CC              0004
5   BB              0002

Now, use this query and tbl_scan in a new query where you join these on follow_id. Finally, add fields from tbl_scan and sort/filter as needed.

The function:

Public Function Years( _
  ByVal datDate1 As Date, _
  ByVal datDate2 As Date, _
  Optional ByVal booLinear As Boolean) _
  As Integer

' Returns the difference in full years between datDate1 and datDate2.
'
' Calculates correctly for:
'   negative differences
'   leap years
'   dates of 29. February
'   date/time values with embedded time values
'   negative date/time values (prior to 1899-12-29)
'
' Optionally returns negative counts rounded down to provide a
' linear sequence of year counts.
' For a given datDate1, if datDate2 is decreased step wise one year from
' returning a positive count to returning a negative count, one or two
' occurrences of count zero will be returned.
' If booLinear is False, the sequence will be:
'   3, 2, 1, 0,  0, -1, -2
' If booLinear is True, the sequence will be:
'   3, 2, 1, 0, -1, -2, -3
'
' If booLinear is False, reversing datDate1 and datDate2 will return
' results of same absolute Value, only the sign will change.
' This behaviour mimics that of Fix().
' If booLinear is True, reversing datDate1 and datDate2 will return
' results where the negative count is offset by -1.
' This behaviour mimics that of Int().

' DateAdd() is used for check for month end of February as it correctly
' returns Feb. 28. when adding a count of years to dates of Feb. 29.
' when the resulting year is a common year.
'
' 2000-11-03. Cactus Data ApS, CPH.
' 2000-12-16. Leap year correction modified to be symmetrical.
'             Calculation of intDaysDiff simplified.
'             Renamed from YearsDiff() to Years().
' 2000-12-18. Introduced cbytMonthDaysMax.
' 2007-06-22. Version 2. Complete rewrite.
'             Check for month end of February performed with DateAdd()
'             after idea of Markus G. Fischer.

  Dim intDiff   As Integer
  Dim intSign   As Integer
  Dim intYears  As Integer

  ' Find difference in calendar years.
  intYears = DateDiff("yyyy", datDate1, datDate2)
  ' For positive resp. negative intervals, check if the second date
  ' falls before, on, or after the crossing date for a full 12 months period
  ' while at the same time correcting for February 29. of leap years.
  If DateDiff("d", datDate1, datDate2) > 0 Then
    intSign = Sgn(DateDiff("d", DateAdd("yyyy", intYears, datDate1), datDate2))
    intDiff = Abs(intSign < 0)
  Else
    intSign = Sgn(DateDiff("d", DateAdd("yyyy", -intYears, datDate2), datDate1))
    If intSign <> 0 Then
      ' Offset negative count of years to continuous sequence if requested.
      intDiff = Abs(booLinear)
    End If
    intDiff = intDiff - Abs(intSign < 0)
  End If

  ' Return count of years as count of full 12 months periods.
  Years = intYears - intDiff

End Function

To calculate with months, use a similar function to get it right:

Public Function Months( _
  ByVal datDate1 As Date, _
  ByVal datDate2 As Date, _
  Optional ByVal booLinear As Boolean) _
  As Integer

' Returns the difference in full months between datDate1 and datDate2.
'
' Calculates correctly for:
'   negative differences
'   leap years
'   dates of 29. February
'   date/time values with embedded time values
'   negative date/time values (prior to 1899-12-29)
'
' Optionally returns negative counts rounded down to provide a
' linear sequence of month counts.
' For a given datDate1, if datDate2 is decreased stepwise one month from
' returning a positive count to returning a negative count, one or two
' occurrences of count zero will be returned.
' If booLinear is False, the sequence will be:
'   3, 2, 1, 0,  0, -1, -2
' If booLinear is True, the sequence will be:
'   3, 2, 1, 0, -1, -2, -3
'
' If booLinear is False, reversing datDate1 and datDate2 will return
' results of same absolute Value, only the sign will change.
' This behaviour mimics that of Fix().
' If booLinear is True, reversing datDate1 and datDate2 will return
' results where the negative count is offset by -1.
' This behaviour mimics that of Int().

' DateAdd() is used for check for month end of February as it correctly
' returns Feb. 28. when adding a count of months to dates of Feb. 29.
' when the resulting year is a common year.
'
' 2010-03-30. Cactus Data ApS, CPH.

  Dim intDiff   As Integer
  Dim intSign   As Integer
  Dim intMonths As Integer

  ' Find difference in calendar months.
  intMonths = DateDiff("m", datDate1, datDate2)
  ' For positive resp. negative intervals, check if the second date
  ' falls before, on, or after the crossing date for a 1 month period
  ' while at the same time correcting for February 29. of leap years.
  If DateDiff("d", datDate1, datDate2) > 0 Then
    intSign = Sgn(DateDiff("d", DateAdd("m", intMonths, datDate1), datDate2))
    intDiff = Abs(intSign < 0)
  Else
    intSign = Sgn(DateDiff("d", DateAdd("m", -intMonths, datDate2), datDate1))
    If intSign <> 0 Then
      ' Offset negative count of months to continuous sequence if requested.
      intDiff = Abs(booLinear)
    End If
    intDiff = intDiff - Abs(intSign < 0)
  End If

  ' Return count of months as count of full 1 month periods.
  Months = intMonths - intDiff

End Function

_How could I, for example, combine that with what Andrew suggested in order to get what I'm looking for?_ Well, you don't specify your required output, but, as I wrote: _use this query and tbl_scan in a new query where you join these on follow_id. Finally, add fields from tbl_scan and sort/filter as needed._ — Gustav, Feb 04 '18 at 08:44

score 0 · Answer 2 · answered Feb 03 '18 at 06:51

You can calculate the 'Age At Follow' using the DateDiff function to get the difference between the birth_date and the follow_date:

DateDiff("yyyy", tbl_biography.birth_date, tbl_follow_id.follow_date)

You can place that expression into a field in the Query Designer.

You need to join all three tables (because you wants results from tbl_scan_id and you need tbl_follow_id for the follow_date and tbl_biography for the birth_date).

If your tables have appropriate relationships defined then the query designer will join them, but if not, you'll need to join them manually (dragging individual_ids together and follow_ids together).

Sample SQL is below:

SELECT 
  tbl_scan_id.*, 
  DateDiff("yyyy", tbl_biography.birth_date, tbl_follow_id.follow_date) AS AgeAtFollow, 
  tbl_follow_id.follow_date, 
  tbl_biography.individual_id, 
  tbl_biography.birth_date
FROM 
  (tbl_follow_id 
  INNER JOIN tbl_biography 
    ON tbl_follow_id.individual_id = tbl_biography.individual_id) 
  INNER JOIN tbl_scan_id 
    ON tbl_follow_id.follow_id = tbl_scan_id.follow_id;

Then, just filter on the AgeAtFollow field as you need (or group by it, etc). You can also re-purpose that query and add in other biography fields to filter on (sex, mother, demographic, etc).

Andrew, Gustav, thank you for your input. I tried the query that Andrew suggested, and it works, but as Gustav pointed out it returns a column of 0, 1, 2, etc. years. For some reason, individuals that are both under and over 1 year old have the number 1 assigned to them. I think this has to do with the problem Gustav pointed out. — smlee87, Feb 03 '18 at 14:30
I tried using months instead of years, and this worked better. I can select all the individuals through 12 months of age, then 12-24, and so on. However, it would be nice to get these into decimal years or something. Gustav, I don't quite understand your query. How could I, for example, combine that with what Andrew suggested in order to get what I'm looking for? Is that even possible? — smlee87, Feb 03 '18 at 14:30
You can use `DateDiff("d", date1, date2)/365` to get decimal years. — andrew, Feb 03 '18 at 23:07
"Worked better". No, it works just as bad. I've appended a function that calculates months correctly to my answer. — Gustav, Feb 04 '18 at 08:49

SQL Sort data into age bins based on date of birth and timestamp of datapoint

2 Answers2