4

I have monthly time series data in table where dates are as a last day of month. Some of the dates are missing in the data. I want to insert those dates and put zero value for other attributes. Table is as follows:

id     report_date   price
1       2015-01-31    40
1       2015-02-28    56
1       2015-04-30    34
2       2014-05-31    45
2       2014-08-31    47

I want to convert this table to

id     report_date   price
1       2015-01-31    40
1       2015-02-28    56
1       2015-03-31    0
1       2015-04-30    34
2       2014-05-31    45
2       2014-06-30    0
2       2014-07-31    0
2       2014-08-31    47

Is there any way we can do this in Postgresql? Currently we are doing this in Python. As our data is growing day by day and its not efficient to handle I/O just for one task.

Thank you

liferacer
  • 2,473
  • 2
  • 17
  • 16

2 Answers2

1

You can do this using generate_series() to generate the dates and then left join to bring in the values:

with m as (
      select id, min(report_date) as minrd, max(report_date) as maxrd
      from t
      group by id
     )
select m.id, m.report_date, coalesce(t.price, 0) as price 
from (select m.*, generate_series(minrd, maxrd, interval '1' month) as report_date
      from m
     ) m left join
     t
     on m.report_date = t.report_date;

EDIT:

Turns out that the above doesn't quite work, because adding months to the end of month doesn't keep the last day of the month.

This is easily fixed:

with t as (
      select 1 as id, date '2012-01-31' as report_date, 10 as price union all
      select 1 as id, date '2012-04-30', 20
     ), m as (
      select id, min(report_date) - interval '1 day' as minrd, max(report_date) - interval '1 day' as maxrd
      from t
      group by id
     )
select m.id, m.report_date, coalesce(t.price, 0) as price 
from (select m.*, generate_series(minrd, maxrd, interval '1' month) + interval '1 day' as report_date
      from m
     ) m left join
     t
     on m.report_date = t.report_date;

The first CTE is just to generate sample data.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

This is a slight improvement over Gordon's query which fails to get the last date of a month in some cases.

Essentially you generate all the month end dates between the min and max date for each id (using generate_series) and left join on this generated table to show the missing dates with 0 price.

with minmax as (
      select id, min(report_date) as mindt, max(report_date)  as maxdt
      from t
      group by id
     )
select m.id, m.report_date, coalesce(t.price, 0) as price 
from (select *, 
      generate_series(date_trunc('MONTH',mindt+interval '1' day),
                      date_trunc('MONTH',maxdt+interval '1' day), 
                      interval '1' month) - interval '1 day' as report_date
      from minmax
     ) m 
left join t on m.report_date = t.report_date

Sample Demo

Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58