In order to establish seasonal effects on energy use, I need to align the energy use information that I have from a billing database with monthly temperatures.
I'm working with a billing dataset that has bills of varying lengths and start and end dates, and I'd like to obtain the monthly average for each account within each month. For example, I have a billing database that has the following characteristics:
acct amount begin end days
1 2242 11349 2009-10-06 2009-11-04 29
2 2242 12252 2009-11-04 2009-12-04 30
3 2242 21774 2009-12-04 2010-01-08 35
4 2242 18293 2010-01-08 2010-02-05 28
5 2243 27217 2009-10-06 2009-11-04 29
6 2243 117 2009-11-04 2009-12-04 30
7 2243 14543 2009-12-04 2010-01-08 35
I would like to figure out how to coerce these somewhat irregular time series (for each account) to get the average amount per day within each month that is spanned within each bill, such that:
acct amount begin end days avgamtpday
1 2242 11349 2009-10-01 2009-10-31 31 X
2 2242 12252 2009-11-01 2009-11-30 30 X
3 2242 21774 2009-12-01 2010-12-31 31 X
4 2242 18293 2010-01-01 2010-01-31 31 X
4 2242 18293 2010-02-01 2010-02-28 28 X
5 2243 27217 2009-10-01 2009-10-31 31 X
6 2243 117 2009-11-01 2009-11-30 30 X
7 2243 14543 2009-12-01 2009-12-31 30 X
7 2243 14543 2010-01-01 2010-01-31 31 X
I'm fairly agnostic to whichever tool can do this, since I only have to do this once.
An additional wrinkle is the table is about 150,000 rows long, which is not really very big by most standards, but big enough to make a loop solution in R difficult. I've investigated using the zoo, xts, and tempdisagg packages in R. I started writing a really ugly loop that would split each bill, then create one row for each month within an existing bill, and then tapply() to summarize by accts and months, but honestly, couldn't see how to do it efficiently.
In MySQL, I've tried this:
create or replace view v3 as select 1 n union all select 1 union all select 1;
create or replace view v as select 1 n from v3 a, v3 b union all select 1;
set @n = 0;
drop table if exists calendar; create table calendar(dt date primary key);
insert into calendar
select cast('2008-1-1' + interval @n:=@n+1 day as date) as dt from v a, v b, v c, v d, v e, v;select acct, amount, begin, end, billAmtPerDay, sum(billAmtPerDay), MonthAmt, count() Days, sum(billAmtPerDay)/count() AverageAmtPerDay, year(dt), month(dt) FROM ( select *, amount/days billAmtPerDay from bills b inner join calendar c on dt between begin and end and begin <> dt) x group by acct, amount, begin, end, billAmtPerDay, year(dt), month(dt);
But for reasons I don't understand, my server doesn't like this table, and gets hung up on the inner join, even when I stage the different calculations. I'm investigating if there are any temporary memory limits on it.
Thanks!