1

Sample Table

   CREATE TABLE SAMPLE_TABLE (WEEK DATE, TYPE VARCHAR(50), Movie VARCHAR(50), Question VARCHAR(50), Answer VARCHAR(50), value NUMBER(38,0));

Sample Data

INSERT INTO sample_table VALUES('10/1/2020',    'A',    'Contaco',  '1',    'N/A',  '1'),
('10/1/2020',   'A',    'Contaco',  '1',    'Definitely not',   '4'),
('10/1/2020',   'A',    'Contaco',  '1',    'Definitely',   '2'),
('10/1/2020',   'A',    'Contaco',  '1',    'Probably', '2'),
('10/1/2020',   'A',    'Contaco',  '1',    'Maybe',    '1'),
('10/8/2020',   'A',    'Contaco',  '1',    'N/A',  '3'),
('10/8/2020',   'A',    'Contaco',  '1',    'Definitely not',   '1'),
('10/8/2020',   'A',    'Contaco',  '1',    'Definitely',   '2'),
('10/8/2020',   'A',    'Contaco',  '1',    'Probably', '4'),
('10/8/2020',   'A',    'Contaco',  '1',    'Maybe',    '1'),
('10/15/2020',  'A',    'Contaco',  '1',    'N/A',  '2'),
('10/15/2020',  'A',    'Contaco',  '1',    'Definitely not',   '1'),
('10/15/2020',  'A',    'Contaco',  '1',    'Definitely',   '2'),
('10/15/2020',  'A',    'Contaco',  '1',    'Probably', '3'),
('10/15/2020',  'A',    'Contaco',  '1',    'Maybe',    '2'),
('10/1/2020',   'B',    'Contaco',  '1',    'N/A',  '1'),
('10/1/2020',   'B',    'Contaco',  '1',    'Definitely not',   '4'),
('10/1/2020',   'B',    'Contaco',  '1',    'Definitely',   '2'),
('10/1/2020',   'B',    'Contaco',  '1',    'Maybe',    '1'),
('10/8/2020',   'B',    'Contaco',  '1',    'N/A',  '3'),
('10/8/2020',   'B',    'Contaco',  '1',    'Definitely',   '1'),
('10/8/2020',   'B',    'Contaco',  '1',    'Probably', '2'),
('10/8/2020',   'B',    'Contaco',  '1',    'Maybe',    '1'),
('10/15/2020',  'B',    'Contaco',  '1',    'N/A',  '2'),
('10/15/2020',  'B',    'Contaco',  '1',    'Definitely not',   '1'),
('10/15/2020',  'B',    'Contaco',  '1',    'Definitely',   '2'),
('10/15/2020',  'B',    'Contaco',  '1',    'Maybe',    '2') ;

Current query

 Select week, type, movie, question, answer, value,
 LAG(value, 1, 0) OVER (PARTITION BY movie, question, answer, type ORDER BY movie, type, week ASC) AS one_week_prior_value,
 LAG(value, 2, 0) OVER (PARTITION BY movie, question, answer, type ORDER BY movie, type, week ASC) AS two_week_prior_value
 from sample_table ;

With this query - I am trying to derive the value for "One week prior value" and "Two week prior value" for the same type, movie, question and answer using the lag function. It works perfectly for when Type = A because the Answer for every Question is in the data for each week.

The issue is when Type = B when all the "Answers" options are not available for each week. The lag function finds the previous most non-null value or it doesn't give a value for the current week for the "Answer" value that doesn't exist. Examples below

Two Issues:

  1. The row for 10/15/2020 for "Definitely not" for "One Week Prior Value" should be 0 not 4 because the previous week does not have any values for "Definitely not"

  2. There should be a row for 10/15/2020 for "Probably" with 0 for "Value" since it doesn't have a value for that week but does have a value for the previous 2 weeks.

+--------------------------+------+---------+----------+----------------+-------+----------------------+----------------------+
| Month, Day, Year of Week | Type | Movie   | Question | Answer         | Value | One Week Prior Value | Two Week Prior Value |
+--------------------------+------+---------+----------+----------------+-------+----------------------+----------------------+
| 1-Oct-20                 | A    | Contaco | 1        | Definitely     | 2     | 0                    | 0                    |
| 1-Oct-20                 | A    | Contaco | 1        | Definitely not | 4     | 0                    | 0                    |
| 1-Oct-20                 | A    | Contaco | 1        | Maybe          | 1     | 0                    | 0                    |
| 1-Oct-20                 | A    | Contaco | 1        | N/A            | 1     | 0                    | 0                    |
| 1-Oct-20                 | A    | Contaco | 1        | Probably       | 2     | 0                    | 0                    |
| 8-Oct-20                 | A    | Contaco | 1        | Definitely     | 2     | 2                    | 0                    |
| 8-Oct-20                 | A    | Contaco | 1        | Definitely not | 1     | 4                    | 0                    |
| 8-Oct-20                 | A    | Contaco | 1        | Maybe          | 1     | 1                    | 0                    |
| 8-Oct-20                 | A    | Contaco | 1        | N/A            | 3     | 1                    | 0                    |
| 8-Oct-20                 | A    | Contaco | 1        | Probably       | 4     | 2                    | 0                    |
| 15-Oct-20                | A    | Contaco | 1        | Definitely     | 2     | 2                    | 2                    |
| 15-Oct-20                | A    | Contaco | 1        | Definitely not | 1     | 1                    | 4                    |
| 15-Oct-20                | A    | Contaco | 1        | Maybe          | 2     | 1                    | 1                    |
| 15-Oct-20                | A    | Contaco | 1        | N/A            | 2     | 3                    | 1                    |
| 15-Oct-20                | A    | Contaco | 1        | Probably       | 3     | 4                    | 2                    |
| 1-Oct-20                 | B    | Contaco | 1        | Definitely     | 2     | 0                    | 0                    |
| 1-Oct-20                 | B    | Contaco | 1        | Definitely not | 4     | 0                    | 0                    |
| 1-Oct-20                 | B    | Contaco | 1        | Maybe          | 1     | 0                    | 0                    |
| 1-Oct-20                 | B    | Contaco | 1        | N/A            | 1     | 0                    | 0                    |
| 8-Oct-20                 | B    | Contaco | 1        | Definitely     | 1     | 2                    | 0                    |
| 8-Oct-20                 | B    | Contaco | 1        | Maybe          | 1     | 1                    | 0                    |
| 8-Oct-20                 | B    | Contaco | 1        | N/A            | 3     | 1                    | 0                    |
| 8-Oct-20                 | B    | Contaco | 1        | Probably       | 2     | 0                    | 0                    |
| 15-Oct-20                | B    | Contaco | 1        | Definitely     | 2     | 1                    | 2                    |
| 15-Oct-20                | B    | Contaco | 1        | Definitely not | 1     | 4                    | 0                    |
| 15-Oct-20                | B    | Contaco | 1        | Maybe          | 2     | 1                    | 1                    |
| 15-Oct-20                | B    | Contaco | 1        | N/A            | 2     | 3                    | 1                    |
+--------------------------+------+---------+----------+----------------+-------+----------------------+----------------------+

This is what my ideal output should look like. Any suggestions on how to make this happen with the given available data? It took me a while to put this post so hopefully it will be worth it! Thank you in advance

  1. You can see that for 10/15/2020 - The One Week Prior Value for "Definitely Not' is 0 not 4
  2. You can see that for 10/15/2020 - The Value for "Probably" is 0

+------------------+------+-----------+----------+----------------+-------+------------------+--------------------+
| Week             | Type | Movie     | Question | Answer         | value | prior week value | 2 week prior value |
+------------------+------+-----------+----------+----------------+-------+------------------+--------------------+
| October 1, 2020  | A    | Contaco| 1        | N/A            | 1.000 | 0.000            | 0.000              |
| October 1, 2020  | A    | Contaco| 1        | Definitely not | 4.000 | 0.000            | 0.000              |
| October 1, 2020  | A    | Contaco| 1        | Definitely     | 2.000 | 0.000            | 0.000              |
| October 1, 2020  | A    | Contaco| 1        | Probably       | 2.000 | 0.000            | 0.000              |
| October 1, 2020  | A    | Contaco| 1        | Maybe          | 1.000 | 0.000            | 0.000              |
| October 8, 2020  | A    | Contaco| 1        | N/A            | 3.000 | 1.000            | 0.000              |
| October 8, 2020  | A    | Contaco| 1        | Definitely not | 1.000 | 4.000            | 0.000              |
| October 8, 2020  | A    | Contaco| 1        | Definitely     | 2.000 | 2.000            | 0.000              |
| October 8, 2020  | A    | Contaco| 1        | Probably       | 4.000 | 2.000            | 0.000              |
| October 8, 2020  | A    | Contaco| 1        | Maybe          | 1.000 | 1.000            | 0.000              |
| October 15, 2020 | A    | Contaco| 1        | N/A            | 2.000 | 3.000            | 1.000              |
| October 15, 2020 | A    | Contaco| 1        | Definitely not | 1.000 | 4.000            | 4.000              |
| October 15, 2020 | A    | Contaco| 1        | Definitely     | 2.000 | 2.000            | 2.000              |
| October 15, 2020 | A    | Contaco| 1        | Probably       | 3.000 | 4.000            | 2.000              |
| October 15, 2020 | A    | Contaco| 1        | Maybe          | 2.000 | 1.000            | 1.000              |
| October 1, 2020  | B    | Contaco| 1        | N/A            | 1.000 | 0.000            | 0.000              |
| October 1, 2020  | B    | Contaco| 1        | Definitely not | 4.000 | 0.000            | 0.000              |
| October 1, 2020  | B    | Contaco| 1        | Definitely     | 2.000 | 0.000            | 0.000              |
| October 1, 2020  | B    | Contaco| 1        | Probably       | 0.000 | 0.000            | 0.000              |
| October 1, 2020  | B    | Contaco| 1        | Maybe          | 1.000 | 0.000            | 0.000              |
| October 8, 2020  | B    | Contaco| 1        | N/A            | 3.000 | 1.000            | 0.000              |
| October 8, 2020  | B    | Contaco| 1        | Definitely not | 0.000 | 4.000            | 0.000              |
| October 8, 2020  | B    | Contaco| 1        | Definitely     | 2.000 | 2.000            | 0.000              |
| October 8, 2020  | B    | Contaco| 1        | Probably       | 4.000 | 0.000            | 0.000              |
| October 8, 2020  | B    | Contaco| 1        | Maybe          | 1.000 | 1.000            | 0.000              |
| October 15, 2020 | B    | Contaco| 1        | N/A            | 2.000 | 3.000            | 1.000              |
| October 15, 2020 | B    | Contaco| 1        | Definitely not | 1.000 | 0.000            | 4.000              |
| October 15, 2020 | B    | Contaco| 1        | Definitely     | 2.000 | 2.000            | 2.000              |
| October 15, 2020 | B    | Contaco| 1        | Probably       | 0.000 | 4.000            | 0.000              |
| October 15, 2020 | B    | Contaco| 1        | Maybe          | 2.000 | 1.000            | 1.000              |
+------------------+------+-----------+----------+----------------+-------+------------------+--------------------+
analytica
  • 105
  • 6

2 Answers2

0

You can solve it with a couple correlated queries (equivalent to a left self join):

  select week, type, movie, question, answer, value
   , (select any_value(value)
      from sample_table 
      where week=a.week-7 
      and (movie, question, answer, type) = (a.movie, a.question, a.answer, a.type)
     ) prev1
   , (select any_value(value) 
      from sample_table 
      where week=a.week-14 
      and (movie, question, answer, type) = (a.movie, a.question, a.answer, a.type)
     ) prev2
 from sample_table a
 order by movie, question, answer, type, week;

enter image description here

Thanks for the easy question repro setup!


Now, with the added 0s requested in the comments. I think the SQL is getting too crazy, but since you invested so much in the setup...

with combos as (
    select *
    from (select distinct movie from sample_table) a
    , (select distinct question from sample_table) b
    , (select distinct answer from sample_table) c
    , (select distinct type from sample_table) d
    , (select distinct week from sample_table) e
)

select week, type, movie, question, answer
, ifnull((select any_value(value)
  from sample_table 
  where (week, movie, question, answer, type) = (a.week, a.movie, a.question, a.answer, a.type)
 ), (select 0*max(value)
  from sample_table 
  where (movie, question, answer, type) = (a.movie, a.question, a.answer, a.type)
  and week<a.week
 )) value
, ifnull((select any_value(value)
  from sample_table 
  where (week, movie, question, answer, type) = (a.week-7, a.movie, a.question, a.answer, a.type)
 ), (select 0*max(value)
  from sample_table 
  where (movie, question, answer, type) = (a.movie, a.question, a.answer, a.type)
  and week<a.week
 )) prev1
, ifnull((select any_value(value)
  from sample_table 
  where (week, movie, question, answer, type) = (a.week-14, a.movie, a.question, a.answer, a.type)
 ), (select 0*max(value)
  from sample_table 
  where (movie, question, answer, type) = (a.movie, a.question, a.answer, a.type)
  and week<a.week-7
 )) prev2
from combos a
order by movie, question, answer, type, week;

enter image description here

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • Thank you for your response! I see this query solves issue #1 but do you know how to solve issue #2? For row 10/15/2020, type B, there needs to be a row for "Probably" with a Value of 0, PREV1 of 2, PREV of 0. – analytica Dec 16 '20 at 04:25
  • Check the updated answer, but let me tell you... not the best idea to pursue this with SQL probably. – Felipe Hoffa Dec 16 '20 at 04:51
  • Amazing! Thanks for giving it a try. Curious as to why not to pursue with SQL. Do you have any suggestions on how to handle this? – analytica Dec 16 '20 at 04:56
  • I mean, what's the final goal? In any case, if the solution works for you - please upvote and accept the answer. – Felipe Hoffa Dec 16 '20 at 04:56
0

You were close. You need to restore the missing B entries such as 10/15/2020, "Probably", and then apply your window functions. For example, like this:

SELECT
   t.*,
   LAG(t.value, 1, 0) OVER (
       PARTITION BY t.movie, t.question, t.answer, t.type 
       ORDER BY t.movie, t.type, t.week ASC
   ) AS one_week_prior_value,
   LAG(t.value, 2, 0) OVER (
       PARTITION BY t.movie, t.question, t.answer, t.type 
       ORDER BY t.movie, t.type, t.week ASC
   ) AS two_week_prior_value
FROM (
  SELECT
    weeks.week, 
    movies.type, 
    movies.movie, 
    movies.question, 
    movies.answer, 
    COALESCE(t.value,0) AS value
  FROM (
      SELECT DISTINCT week 
      FROM sample_table
  ) weeks
  CROSS JOIN (
      SELECT DISTINCT type, movie, question, answer 
      FROM sample_table
  ) movies
  LEFT JOIN sample_table t 
      ON weeks.week = t.week AND 
         movies.movie = t.movie AND 
         movies.type = t.type AND
         movies.question = t.question AND
         movies.answer = t.answer
) t
id'7238
  • 2,428
  • 1
  • 3
  • 11
  • I get an error when I try to run this query. "SQL compilation error: syntax error line 3 at position 40 unexpected 'w'." – analytica Dec 16 '20 at 05:42
  • Look at [Fiddle demo](https://www.db-fiddle.com/f/whfZGwnAJcAYaQii6avF6k/0) – id'7238 Dec 16 '20 at 05:44
  • Wow! This is amazing. Wonder why I am getting an error when trying to run this in Snowflake. Wonder if there is a similar function I can do in Snowflake similar to that last 4 lines in the code starting with "Window w" – analytica Dec 16 '20 at 05:50
  • It's MySQL, didn't see the database tag. Just remove `WINDOW` clause and replace `LAG` functions with your as is. – id'7238 Dec 16 '20 at 05:52
  • Genius!! This is a great solution! – analytica Dec 16 '20 at 05:55
  • I have updated answer without `WINDOW` clause. – id'7238 Dec 16 '20 at 05:59
  • Curious if you would have a solution for this. Let's say I had multiple questions (1,2,3 and so forth) but I only wanted to replicate all missing entries for only question 1 where the answer could be one of the following: definitely, probably, n/a, maybe, definitely not) – analytica Dec 21 '20 at 06:35