0

In my database, I have a table with 3 columns: datetime, parameter, value. My goal is to get a subset of the data. I want to get all rows coming after parameter = 'B' and value > 5 (include this row in the result set). Leave out all rows coming after parameter = 'B' and value <=5 (include this row in the result set).

source table and expected result

In other words, I want to use VALUE in rows with parameter = B as a flag (pseudocode):


    include_flag = 0
    result_set = empty table
    
    for row in rows:
        if parameter = B and value > 5:
            result_set.append(row)
            include_flag = 1
        elif parameter = B and value <= 5:
            result_set.append(row)
            include_flag = 0
        elif parameter <> B:
            if include_flag = 1:
                result_set.append(row)
            elif include_flag = 0:
                skip(row)

source table:

date_time parameter value
1.9.2021 12:34:00 A 0.50
2.9.2021 14:01:00 B 7.40
2.9.2021 21:52:00 C 85.40
3.9.2021 3:15:00 B 3.80
4.9.2021 1:42:00 C 67.30
5.9.2021 12:34:00 A 0.3
6.9.2021 12:34:00 C 76.50
6.9.2021 17:22:00 A 0.40
6.9.2021 19:37:00 B 8.10
7.9.2021 12:34:00 C 91.70
7.9.2021 22:12:00 C 87.60
8.9.2021 7:17:00 A 0.60
9.9.2021 5:34:00 B 5.80
9.9.2021 12:34:00 B 4.90
10.9.2021 19:56:00 A 0.60

desired result set:

date_time parameter value
2.9.2021 14:01:00 B 7.40
2.9.2021 21:52:00 C 85.40
3.9.2021 3:15:00 B 3.80
6.9.2021 19:37:00 B 8.10
7.9.2021 12:34:00 C 91.70
7.9.2021 22:12:00 C 87.60
8.9.2021 7:17:00 A 0.60
9.9.2021 5:34:00 B 5.80
9.9.2021 12:34:00 B 4.90

Also, I am interested in a solution leading to an alternative result set, which has the same number of rows as the source table, but contains only those values that are included in the primary result set. (so the rows coming after (parameter = 'B' and value <=5) are also included, just the values are dismissed)

  • How would the solutions differ?

alternative result set:

date_time parameter value
1.9.2021 12:34:00 A NaN
2.9.2021 14:01:00 B 7.40
2.9.2021 21:52:00 C 85.40
3.9.2021 3:15:00 B 3.80
4.9.2021 1:42:00 C NaN
5.9.2021 12:34:00 A NaN
6.9.2021 12:34:00 C NaN
6.9.2021 17:22:00 A NaN
6.9.2021 19:37:00 B 8.10
7.9.2021 12:34:00 C 91.70
7.9.2021 22:12:00 C 87.60
8.9.2021 7:17:00 A 0.60
9.9.2021 5:34:00 B 5.80
9.9.2021 12:34:00 B 4.90
10.9.2021 19:56:00 A NaN
  • What is the optimal solution to this problem(s)?

Any help would be appreciated.

EDIT:

query:

SELECT t.parameter, t.date_time, t.value, B_over_limit = CASE 
   WHEN t.parameter = 'B' AND t.value > 5.0 THEN 1 
   WHEN t.parameter = 'B' AND t.value <= 5.0 THEN 0 
   ELSE null END
FROM table t ORDER BY t.date_time

brings:

date_time parameter value B_over_limit
1.9.2021 12:34:00 A 0.50 null
2.9.2021 14:01:00 B 7.40 1
2.9.2021 21:52:00 C 85.40 null
3.9.2021 3:15:00 B 3.80 0
4.9.2021 1:42:00 C 67.30 null
5.9.2021 12:34:00 A 0.3 null
6.9.2021 12:34:00 C 76.50 null
6.9.2021 17:22:00 A 0.40 null
6.9.2021 19:37:00 B 8.10 1
7.9.2021 12:34:00 C 91.70 null
7.9.2021 22:12:00 C 87.60 null
8.9.2021 7:17:00 A 0.60 null
9.9.2021 5:34:00 B 5.80 1
9.9.2021 12:34:00 B 4.90 0
10.9.2021 19:56:00 A 0.60 null
  • filling all nulls with last previous 0 or 1 would be almost the final result. How can I achieve that?
  • i.e. using the previous query as a subquery s, and then something like(?):
SELECT s.parameter, s.date_time, s.values, 
LAST_VALUE(s.B_over_limit) OVER(...) FROM subquery s ORDER BY date_time

result of subquery and desired intermediate result

okone
  • 3
  • 2
  • An `EXISTS` would be one method; which if you have the right indexes should be fairly performant. – Thom A Oct 04 '21 at 12:29

1 Answers1

0

You need to have a unique column (id) and use “OUTER APPLY” to get the expected results.

As your sample data does not have a unique column, I created a temporary table to create a unique id column and store the sample data.

  1. Sample Input:

enter image description here

  1. create a temp table with an identity column and all columns from Input data.
  2. insert Input data into temp table order by date_time to maintain the data order.

Temp table:

enter image description here

  1. write your query from this temp table to generate the B_over_limit column and insert it into another temp table for further use.

enter image description here

  1. Now write a query with outer apply to get the previous value of a column when NULL to get the final result.

enter image description here

Query:

select * from tb1 --sample source data
order by date_time

--create temp table #t1 (use original table datatypes)
   create table #t1
   (id int identity not null,
    date_time varchar(50), 
    parameter char(1),
    value float
   )

 insert into #t1  insert input into 
 select * from tb1 order by convert(datetime,date_time)

 select * from #t1;

 SELECT id, t.date_time, t.parameter, t.value, 
   B_over_limit = CASE WHEN t.parameter = 'B' AND t.value > 5.0 THEN 1 
                       WHEN t.parameter = 'B' AND t.value <= 5.0 THEN 0 
                       ELSE null 
                   END 
 into #t2
 FROM #t1 t ORDER BY convert(datetime,t.date_time)

 select * from #t2

 select a.date_time, a.parameter, a.value,
     case when id = 1 and a.B_over_limit is NULL then 0 
     else ISNULL(a.B_over_limit, t.B_over_limit) 
     end as B_over_limit
 from #t2 a
 outer apply
   (select top 1 B_over_limit from #t2 b 
       where b.id < a.id and b.date_time is not null 
         and b.parameter is not null and b.value is not null
         and b.B_over_limit is not null
         and a.B_over_limit is null 
       order by id desc ) t
NiharikaMoola-MT
  • 4,700
  • 1
  • 3
  • 15