Postgresql Fill Gaps - Matrix of latest available information for a given branch

Question

I have a table with indexes of imported files, with dates and branches of each imported files.

Now I need to do a consolidation of multiple branches, so, that I have to duplicate the information from some branches when we have holidays, so that the data is consistent, basically I need to fill these gaps with the latest available information.

I tried doing some self-joins with ranking in order to shift between them and get the previous data, but it didn't work.

What I have is a table:

rundate, branch, imported 
2015-04-01, PL1, TRUE 
2015-04-01, ES1, TRUE 
2015-04-01, CZ4, TRUE 
2015-04-02, PL1, TRUE 
2015-04-02, ES1, TRUE 
2015-04-02, CZ4, TRUE   
2015-04-03, ES1, TRUE 
2015-04-03, CZ4, TRUE

In this example, I would like to make a query that returns:

gap_date, branch, real_date
2015-04-03, PL1, 2015-04-02

This table is quite small (couple thousand lines), so, performance shouldn't be a big issue.

Any idea on how can I achieve that?

Now I am using a function that receives the rundate and branch of the gap dates as parameters, and answers the latest before the date passed as parameter (using max(rundate) where rundate <= '$1')

Thanks!

please feel free to give a feedback on [below answer](http://stackoverflow.com/a/29631261/3208640) — void, Apr 14 '15 at 15:43

score 1 · Accepted Answer · answered Apr 14 '15 at 15:22

you can use outer join, subquery and cross join:

Schema:

create table tbl(rundate date, 
                 branch varchar(10), 
                 imported bool);
insert into tbl values('2015-04-01', 'PL1', TRUE),
('2015-04-01', 'ES1', TRUE), 
('2015-04-01', 'CZ4', TRUE), 
('2015-04-02', 'PL1', TRUE), 
('2015-04-02', 'ES1', TRUE), 
('2015-04-02', 'CZ4', TRUE),  
('2015-04-03', 'ES1', TRUE), 
('2015-04-03', 'CZ4', TRUE);

Query:

select q.rundate as gap_date,q.branch,
       (select max(tt.rundate) 
        from tbl tt
        where tt.rundate<q.rundate and tt.branch=q.branch)
        as real_date
from tbl t
right outer join(
         select rundate,branch from (
                select distinct rundate from tbl) t1
                cross join (
                select distinct branch from tbl)t2
          )q 
on t.rundate=q.rundate and t.branch=q.branch
where t.branch is null

Result:

gap_date    branch  real_date
2015-04-03  PL1     2015-04-02

That worked great! I didn't know I could reference from subqueries like this. Thanks a lot for your help! — Fulano da Silva Sauro, Apr 15 '15 at 06:59

score 0 · Answer 2 · answered Apr 14 '15 at 14:49

You have to select all unique dates in one set, all unique branches in another - make a cartesian product of it - and then you can check for which combination you have a gap... this is what I mean:

CREATE TEMPORARY TABLE _matrix
SELECT
t1.rundate,
t2.branch
(SELECT DISTINCT rundate FROM yourtable) t1,
(SELECT DISTINCT branch FROM yourtable) t2

Then you can find gaps using LEFT JOIN and "WHERE ... IS NULL" statement:

SELECT
m.rundate,
m.branch
FROM _matrix m
LEFT JOIN yourtable yt ON(yt.rundate = m.rundate AND yt.branch = m.branch)
WHERE yt.rundate IS NULL

Of course the same can be achieved without temporary tables - using just a subquery. Especially when the performance is not crucial.

Yes, I got this far, I am using some CTE to keep it readable, I am extrapolating all the branches and all the dates (select distinct from the table), then I join with the real table and identify the gaps. What I am not being able to get is the "previous available date for this gap". — Fulano da Silva Sauro, Apr 14 '15 at 15:03

joop · Answer 3 · 2015-04-14T16:30:24.960

This is the carthesian product of both domains solution from @voycheck except the records that actually exist.

WITH br AS (
        SELECT DISTINCT branch AS branch FROM tbl
        )
, mima AS (
        SELECT MIN(rundate) as mi
             , MAX(rundate) as ma
        FROM tbl)
, rng AS (
        SELECT generate_series( mima.mi, mima.ma, '1 day'::interval)::date AS rundate
        FROM mima
        )
SELECT * FROM rng
JOIN br ON NOT EXISTS ( -- cartesian product EXCEPT
        SELECT *
        FROM tbl t
        WHERE t.branch = br.branch
        AND t.rundate = rng.rundate
        )
        ;

Postgresql Fill Gaps - Matrix of latest available information for a given branch

3 Answers3