0

I have data looking like this enter image description here

What I am trying to achieve is data for historgram that would count values into specific ranges. For category A value range 1-100 and for category B value range 0-125 where value for category C = 5. The problem I have that is data in multiplerows and I need to filter first on C and then count values into ranges to display histogram. To get counts lets say per 10 seconds looking like this enter image description here

Code to generate data:

CREATE TEMP TABLE sample (
     ts timestamp
    ,category varchar(2)
    , val int)
    
insert into sample values 
(to_timestamp('01.01.2018 08:00:01', 'dd-mm-yyyy hh24:mi:ss'), 'A', 12),
(to_timestamp('01.01.2018 08:00:02', 'dd-mm-yyyy hh24:mi:ss'), 'A', 44),
(to_timestamp('01.01.2018 08:00:03', 'dd-mm-yyyy hh24:mi:ss'), 'C', 1),
(to_timestamp('01.01.2018 08:00:04', 'dd-mm-yyyy hh24:mi:ss'), 'B', 24),
(to_timestamp('01.01.2018 08:00:05', 'dd-mm-yyyy hh24:mi:ss'), 'B', 111),
(to_timestamp('01.01.2018 08:00:06', 'dd-mm-yyyy hh24:mi:ss'), 'C', 5),
(to_timestamp('01.01.2018 08:00:07', 'dd-mm-yyyy hh24:mi:ss'), 'A', 145),
(to_timestamp('01.01.2018 08:00:01', 'dd-mm-yyyy hh24:mi:ss'), 'B', 16),
(to_timestamp('01.01.2018 08:00:01', 'dd-mm-yyyy hh24:mi:ss'), 'C', 47),
(to_timestamp('01.01.2018 08:00:02', 'dd-mm-yyyy hh24:mi:ss'), 'C', 5),
(to_timestamp('01.01.2018 08:00:02', 'dd-mm-yyyy hh24:mi:ss'), 'B', 34),
(to_timestamp('01.01.2018 08:00:03', 'dd-mm-yyyy hh24:mi:ss'), 'B', 111),
(to_timestamp('01.01.2018 08:00:03', 'dd-mm-yyyy hh24:mi:ss'), 'C', 5),
(to_timestamp('01.01.2018 08:00:01', 'dd-mm-yyyy hh24:mi:ss'), 'A', 19),
(to_timestamp('01.01.2018 08:00:01', 'dd-mm-yyyy hh24:mi:ss'), 'B', 46),
(to_timestamp('01.01.2018 08:00:01', 'dd-mm-yyyy hh24:mi:ss'), 'C', 57)

I thought if I pivot data like so

s

elect 
    ts, 
    category,
    case when category = 'A' then val end as "A",
    case when category = 'B' then val end as "B",
    case when category = 'C' then val end as "C"
from sample
order by ts

then have problem with pivot nulls

enter image description here

Kylo
  • 109
  • 8
  • Please provide the data as text, not as images. Preferably provide the create table and insert statements in a fiddle, otherwise we have to reproduce your environment from scratch. – Jim Jones Aug 12 '21 at 14:01
  • @JimJones that is very good point, just added code to description – Kylo Aug 12 '21 at 14:33
  • Why category b 0-125 has a count of 1 instead of 2? – Salman A Aug 12 '21 at 15:04

1 Answers1

1

Here it is:

with periods(pts) as 
(
 select * 
 from generate_series
 (
   timestamp '2018-01-01 08:00:00', 
   timestamp '2018-01-01 08:01:00', 
   interval '10 seconds'
 ) ts
)
select  pts period_start, 
        pts + interval '10 seconds' period_end,
        lat.cat_a, 
        lat.cat_b, 
        lat.cat_c
from periods 
cross join lateral 
(
 select  count(1) filter (where category = 'A' and val between 0 and 100) as cat_a, 
         count(1) filter (where category = 'B' and val between 0 and 125) as cat_b, 
         count(1) filter (where category = 'C' and val = 5) as cat_c
 from sample 
 where ts >= pts and ts < pts + interval '10 seconds'
) lat; 
period_start period_end cat_a cat_b cat_c
2018-01-01 08:00:00 2018-01-01 08:00:10 2 2 1
2018-01-01 08:00:10 2018-01-01 08:00:20 0 0 0
2018-01-01 08:00:20 2018-01-01 08:00:30 0 0 0
2018-01-01 08:00:30 2018-01-01 08:00:40 0 0 0
2018-01-01 08:00:40 2018-01-01 08:00:50 0 0 0
2018-01-01 08:00:50 2018-01-01 08:01:00 0 0 0
2018-01-01 08:01:00 2018-01-01 08:01:10 0 0 0

One-row version is simple:

select  min(ts) period_start,
        max(ts) period_end,
        count(1) filter (where category = 'A' and val between 0 and 100) as cat_a, 
        count(1) filter (where category = 'B' and val between 0 and 125) as cat_b, 
        count(1) filter (where category = 'C' and val = 5) as cat_c
from sample; 

Added after the clarification comments

select * from (<the first version of the query here>) t where cat_c > 0;
Stefanov.sm
  • 11,215
  • 2
  • 21
  • 21
  • You are really good, one question why it is generating so many rows, I was expecting one row per group ? – Kylo Aug 12 '21 at 14:50
  • I expected that `sample` contains a lot of records for a long period of time. So it generates one row for every 10-second interval. If you only need one row then it is simpler, you do not need the `periods(pts)` CTE at all. – Stefanov.sm Aug 12 '21 at 14:55
  • Thanks for that it is almost perfect one detail I want count them for A and B and filter on C, so it should show range for A and B but only where category C = 5. – Kylo Aug 13 '21 at 06:45
  • I am not sure that I understand. Category can not be both A and C. – Stefanov.sm Aug 13 '21 at 07:02
  • The count and categories are fine, for both A and B but C should not be used in count as third category but just to filter all values. So do count for A and B when C = 5. I thought I need to first pivot all three values to have them as columns then filter all values based on C and then do the range. I added code and picture with issue. If I have this in three columns then I can do the ranges and filter on C – Kylo Aug 13 '21 at 07:53
  • Added to answer – Stefanov.sm Aug 13 '21 at 09:08
  • Thank you very much, you have been really helpful. – Kylo Aug 13 '21 at 09:18