0

I have gone through some relevant merging questions but my problem is slightly different from the existing ones. I have a table in my PostgreSQL 9.5 database containing four columns i.e., Segment (unique group), height (numeric), lower_limit (numeric) and upper_limit (numeric). The sample data is as follows:

Segment height  lower_limit upper_limit
A       19.3    112         142
A       19.3    142         172
A       20.3    172         202
A       20.3    202         232
A       19.3    232         262
A       19.3    262         292
B       22.1    203         233
B       22.1    233         263
B       22.1    263         293
B       22.1    293         323
B       22.1    323         353
B       22.1    353         383
C       18.9    136         166
C       18.9    166         196
C       18.9    196         226
C       27.1    286         316
C       27.1    316         346
C       6.5     346         376
C       6.5     376         406

I need to conditionally merge rows based on the difference of height values. I would try to explain in steps:

  • Starting from the first height, check if the difference between preceding and succeeding row is less or equal to 1
  • If condition fulfills merge those rows with lower limit of first row and upper limit of merged row
  • if all rows are merged in a group then select the most common height with lower limit of first and upper limit of last merged row
  • repeat this for other groups

    Based on above, the desired output could be like:

    Segment height lower_limit upper_limit A 19.3 112 292 B 22.1 203 383 C 18.9 136 226 C 27.1 286 346 C 6.5 346 406

Can somebody help me so that I could conditionally merge rows based on height difference values?

khajlk
  • 791
  • 1
  • 12
  • 32

2 Answers2

1

Assuming lower_limit column can be used for ordering, you can use

select segment,mode() within group(order by height),min(lower_limit),max(upper_limit) 
from (select t.*
      ,sum(case when abs(height-prev) <= 1 then 0 else 1 end) over(partition by segment order by lower_limit) as grp
      from (select t.*
            ,lag(height) over(partition by segment order by lower_limit) as prev
            from tbl t
           ) t
    ) t
group by segment,grp
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • I modified last two lines. I added: group by grp, segment Order by grp, segment. I got A, B, C, C, C as desired. All values are fine except the first segment A. For A; it gives, 20.3, 112, 292. Any idea why ? – khajlk Jun 10 '17 at 14:50
  • @JibranKhan ..check the edit..using `mode` gives you the most frequent height per segment and grp..which is what you need. – Vamsi Prabhala Jun 10 '17 at 15:21
  • I have one question, If I have one row or two rows containing zero then how can I break that row/rows into a separate group with their lower and upper limit. I think changes need to be made in this line: sum(case when abs(height-prev) <= 1 then 0 else 1 end) - OR- you suggest me to ask this as a separate question? – khajlk Jun 11 '17 at 16:07
1
  -- setting reset points
  with  b as
  (
      select segment, height, lower_limit, upper_limit, 
             case when lag(height) over (partition by segment order by segment, height) is null
                       or abs(height - lag(height) over (partition by segment order by segment, height)) > 1
                  then 1 end as is_reset
      from   foo
  )
     -- setting groups
     , c as
     (
         select segment, height, lower_limit, upper_limit,
                sum(is_reset) over (order by segment, height) as grp
         from b
     )
       -- finding most common height
           select segment, mode() within group (order by height),
                  min(lower_limit) as lower_limit, 
                  max(upper_limit) as upper_limit
           from c
           group by segment, grp
segment |  mode | lower_limit | upper_limit
:------ | ----: | ----------: | ----------:
A       | 19.30 |         112 |         292
B       | 22.10 |         203 |         383
C       |  6.50 |         346 |         406
C       | 18.90 |         136 |         226
C       | 27.10 |         286 |         346

dbfiddle here

Community
  • 1
  • 1
McNets
  • 10,352
  • 3
  • 32
  • 61
  • Thanks a lot. I tested your code in dbfiddle. It worked fine there. Exactly same table is in my Postgres db but it returned six rows i.e., A, B, C, C, C, C. One row C (18.9, 136, 226) is returned twice. Any idea why? – khajlk Jun 10 '17 at 17:11
  • execute the first select and be sure it returns the correct order. – McNets Jun 10 '17 at 17:21
  • Well, I reformated the whole code to make sure i am not making any mistake but still the same result. – khajlk Jun 10 '17 at 17:34
  • does this query `select segment, height, lower_limit, upper_limit, row_number() over (partition by segment order by (select 1)) id from foo` returns the correct order? – McNets Jun 10 '17 at 17:35
  • No it doesn't. One row C, 18.9..... is at the bottom of result which is not correct if you look at the sample data. – khajlk Jun 10 '17 at 17:39
  • does your table any field to use as `order by`? – McNets Jun 10 '17 at 17:40
  • It didn't have the primary key. So, I order by segments. – khajlk Jun 10 '17 at 17:48
  • Yes, now it returned 5 rows as expected. But, heights are returned as 19.3, 22.1, 6.5, 18.9, 27.1 which is not as desired output. Please see above. – khajlk Jun 10 '17 at 18:07