I am having a huge issue turning a time stamped status table into a flat table that can be queried very quickly.
I essentially have a table like this:
╔══════════╦═══════════╦══════════╦══════════╦═══════════╦══════════╦
║ PersonID ║ Firstname ║ Lastname ║ status ║ startdate ║ endate ║
║ 10233 ║ stacy ║ adamns ║ active ║ 12-23-13 ║ 02-11-14 ║
║ 10233 ║ stacy ║ adamns ║ pending ║ 02-11-14 ║ 03-09-14 ║
║ 10233 ║ stacy ║ adamns ║ inactive ║ 03-09-14 ║ 12-31-99 ║
║ 10244 ║ steve ║ smith ║ active ║ 01-07-14 ║ 12-31-99 ║
╚══════════╩═══════════╩══════════╩══════════╩═══════════╩══════════╩
and turn it into:
╔══════════╦══════════╦═══════════╦══════════╦════════╗
║ Date ║ PersonID ║ Firstname ║ Lastname ║ status ║
║ 12-23-13 ║ 10233 ║ stacy ║ adamns ║ active ║
║ 12-24-13 ║ 10233 ║ stacy ║ adamns ║ active ║
║ 12-25-13 ║ 10233 ║ stacy ║ adamns ║ active ║
║ 12-26-13 ║ 10233 ║ stacy ║ adamns ║ active ║
║ ║ ║ ║ ║ ║
╚══════════╩══════════╩═══════════╩══════════╩════════╝
This table has 28 additional columns with various things describing the employee (they are static and don't change e.g Height) and it is 48 million rows long...
I need to know how many employees were in the state of "active" for each day for the past 2 years.
Now with a smaller date range or set of data this is pretty easy, I would just join it with a calendar table with something similar to this:
Create Table People_history as
Select Day_id,Firstname,Lastname,status
from People
Join Time_calendar on day_id between startdate and endate;
I have calculated that the resulting table will become 7.8 billion rows and over 3 terrabytes; however my database cannot even finish the query because it runs out of Temp memory. With a cursor I can get around the memory problem but it takes over 24 hours to run... I only need to do this once so maybe that's what I'll be stuck doing, but I figured I would ask you guys first.
Should I be looking at a different database to do this sort of analysis or just a more efficient method?
I've looked into Cassandra which recommends creating columns for the time intervals or MongoDB which you could throw the intervals and status into their own hash on each person. Are these good alternatives?