0

I have the following SQL problem, in which I am actually looking for the most efficient way to do this. Suppose I have the following table:

r|Company - StartPeriod - EndPeriod - NumberOfEmployees - NumberOfMachines ...
1|A - 2014-1-1 - 2014-2-5 - 30 - 1
2|A - 2014-2-6 - 2014-3-7 - 30 - 1
3|A - 2014-3-8 - 2014-4-9 - 31 - 1
4|A - 2014-4-9 - 2014-5-1 - 32 - 1
5|B - 2014-1-2 - 2014-2-1 - 20 - 2
6|B - 2014-2-2 - 2014-3-1 - 20 - 2
7|B - 2014-3-2 - 2014-4-1 - 20 - 2

Notice the the redundant amount of data? This is because the companies get sampled now and then and therefore the periods are actually kind of random in lengths.

Now the question is, how can merge the rows if and only if the variables NumberOfEmployees, NumberOfMachines etcetera are the same, but StartPeriod and Endperiod differ? Of course I would like to replace the start period with the earliest one and the endperiod with the latest one.

The r is just an indicator, not a variable in the database. In this case, 1 and 2 will be merged, but 3,4 will be kept, since it has different values (namely it has 31 employees instead of 30).

This is in my opinion such an hard problem, that I am not even sure whether SQL can do this.

The resulting output would be:

r|Company - StartPeriod - EndPeriod - NumberOfEmployees - NumberOfMachines ...
1|A - 2014-1-1 - 2014-3-7 - 30 - 1
2|A - 2014-3-8 - 2014-4-9 - 31 - 1
3|A - 2014-4-9 - 2014-5-1 - 32 - 1
4|B - 2014-1-2 - 2014-4-1 - 20 - 2

Thanks for everything!

Snowflake
  • 2,869
  • 3
  • 22
  • 44

2 Answers2

1

You may be able to just use group by:

select company, min(startperiod) as startperiod, max(endperiod) as endperiod,
       numberofemployees, numberofmachines . .
from table t
group by company, numberofemployees, numberofmachines;

If this doesn't solve the problem, please edit the question and provide more information on the additional logic needed for merging rows.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hmm, I never looked at this way... I will try this! It looks promising. Thanks! It works! I think, wow it was pretty simple (I need six minutes before I can accept your answer, THANKS!) – Snowflake Mar 12 '15 at 16:00
  • 1
    Note that if the "same" row occurs later on that row will be merged in as well. – Lennart - Slava Ukraini Mar 12 '15 at 16:07
0

You can use an "island and gap"" strategy:

select r, company, min(startperiod), max(EndPeriod)
     , NumberOfEmployees, NumberOfMachines, ..., grp
from (
    select r, company, startperiod, EndPeriod, ...
         , row_number() over (order by r)
         - row_number() over (partition by company, 
                                           NumberOfEmployees,
                                           NumberOfMachines, ...  
                              order by r) as grp
    from T
)
group by r, company, NumberOfEmployees, NumberOfMachines, ..., grp
Lennart - Slava Ukraini
  • 6,936
  • 1
  • 20
  • 32