In my database, I have a table with 3 columns: datetime, parameter, value. My goal is to get a subset of the data. I want to get all rows coming after parameter = 'B' and value > 5 (include this row in the result set). Leave out all rows coming after parameter = 'B' and value <=5 (include this row in the result set).
source table and expected result
In other words, I want to use VALUE in rows with parameter = B as a flag (pseudocode):
include_flag = 0
result_set = empty table
for row in rows:
if parameter = B and value > 5:
result_set.append(row)
include_flag = 1
elif parameter = B and value <= 5:
result_set.append(row)
include_flag = 0
elif parameter <> B:
if include_flag = 1:
result_set.append(row)
elif include_flag = 0:
skip(row)
source table:
date_time | parameter | value |
---|---|---|
1.9.2021 12:34:00 | A | 0.50 |
2.9.2021 14:01:00 | B | 7.40 |
2.9.2021 21:52:00 | C | 85.40 |
3.9.2021 3:15:00 | B | 3.80 |
4.9.2021 1:42:00 | C | 67.30 |
5.9.2021 12:34:00 | A | 0.3 |
6.9.2021 12:34:00 | C | 76.50 |
6.9.2021 17:22:00 | A | 0.40 |
6.9.2021 19:37:00 | B | 8.10 |
7.9.2021 12:34:00 | C | 91.70 |
7.9.2021 22:12:00 | C | 87.60 |
8.9.2021 7:17:00 | A | 0.60 |
9.9.2021 5:34:00 | B | 5.80 |
9.9.2021 12:34:00 | B | 4.90 |
10.9.2021 19:56:00 | A | 0.60 |
desired result set:
date_time | parameter | value |
---|---|---|
2.9.2021 14:01:00 | B | 7.40 |
2.9.2021 21:52:00 | C | 85.40 |
3.9.2021 3:15:00 | B | 3.80 |
6.9.2021 19:37:00 | B | 8.10 |
7.9.2021 12:34:00 | C | 91.70 |
7.9.2021 22:12:00 | C | 87.60 |
8.9.2021 7:17:00 | A | 0.60 |
9.9.2021 5:34:00 | B | 5.80 |
9.9.2021 12:34:00 | B | 4.90 |
Also, I am interested in a solution leading to an alternative result set, which has the same number of rows as the source table, but contains only those values that are included in the primary result set. (so the rows coming after (parameter = 'B' and value <=5) are also included, just the values are dismissed)
- How would the solutions differ?
alternative result set:
date_time | parameter | value |
---|---|---|
1.9.2021 12:34:00 | A | NaN |
2.9.2021 14:01:00 | B | 7.40 |
2.9.2021 21:52:00 | C | 85.40 |
3.9.2021 3:15:00 | B | 3.80 |
4.9.2021 1:42:00 | C | NaN |
5.9.2021 12:34:00 | A | NaN |
6.9.2021 12:34:00 | C | NaN |
6.9.2021 17:22:00 | A | NaN |
6.9.2021 19:37:00 | B | 8.10 |
7.9.2021 12:34:00 | C | 91.70 |
7.9.2021 22:12:00 | C | 87.60 |
8.9.2021 7:17:00 | A | 0.60 |
9.9.2021 5:34:00 | B | 5.80 |
9.9.2021 12:34:00 | B | 4.90 |
10.9.2021 19:56:00 | A | NaN |
- What is the optimal solution to this problem(s)?
Any help would be appreciated.
EDIT:
query:
SELECT t.parameter, t.date_time, t.value, B_over_limit = CASE
WHEN t.parameter = 'B' AND t.value > 5.0 THEN 1
WHEN t.parameter = 'B' AND t.value <= 5.0 THEN 0
ELSE null END
FROM table t ORDER BY t.date_time
brings:
date_time | parameter | value | B_over_limit |
---|---|---|---|
1.9.2021 12:34:00 | A | 0.50 | null |
2.9.2021 14:01:00 | B | 7.40 | 1 |
2.9.2021 21:52:00 | C | 85.40 | null |
3.9.2021 3:15:00 | B | 3.80 | 0 |
4.9.2021 1:42:00 | C | 67.30 | null |
5.9.2021 12:34:00 | A | 0.3 | null |
6.9.2021 12:34:00 | C | 76.50 | null |
6.9.2021 17:22:00 | A | 0.40 | null |
6.9.2021 19:37:00 | B | 8.10 | 1 |
7.9.2021 12:34:00 | C | 91.70 | null |
7.9.2021 22:12:00 | C | 87.60 | null |
8.9.2021 7:17:00 | A | 0.60 | null |
9.9.2021 5:34:00 | B | 5.80 | 1 |
9.9.2021 12:34:00 | B | 4.90 | 0 |
10.9.2021 19:56:00 | A | 0.60 | null |
- filling all nulls with last previous 0 or 1 would be almost the final result. How can I achieve that?
- i.e. using the previous query as a subquery s, and then something like(?):
SELECT s.parameter, s.date_time, s.values,
LAST_VALUE(s.B_over_limit) OVER(...) FROM subquery s ORDER BY date_time