3

SQL. How can I join table to itself to get desired results just as shown in the table below. The logic is that I want to have Units for the same product and corresponding month of previous year.

The simple left join on source table to itself on key a.[year]=b.[year]+1 (and of course month to month and product to product) would cause the loss of the data where we had values in the previous year and do not have now.

enter image description here

Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191

5 Answers5

4

A full join should be sufficient

  select distinct
    coalesce(a.year, b.year+1) as year
    , coalesce(a.month, b.month) as month
    , coalesce(a.product, b.product) as product
    , a.units as units
    , b.units as units_prev
  from yourtable a
  full join yourtable b on a.[year] = b.[year]+1 and a.[month] = b.[month] and a.product = b.product

Your expected results though are slightly off from the description 2018, month 2, product 2 does not exist with a prior value of 2933.

DB Fiddle : https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=d01dc5bd626854b083be0864f2d5b0e4

Result :

year    month   product units   units_prev
2017    1       1       1721    
2017    2       1       4915    
2017    4       2       2933    
2017    5       1       5230    
2018    1       1               1721
2018    1       2       7672    
2018    2       1       5216    4915
2018    3       1       8911    
2018    4       2               2933
2018    5       1               5230
2019    1       2               7672
2019    2       1               5216
2019    3       1               8911

If you need to filter out futures like that, then you can add an additional where predicate, something like :

where coalesce(a.year, b.year+1) <= year(getdate())
Andrew
  • 26,629
  • 5
  • 63
  • 86
  • Surprising! You do not join on [month]? Is that so or you forgot about it? – Przemyslaw Remin Nov 19 '18 at 10:38
  • ok, what was catching me out was the coalesce of the year, it should of had b,year+1 - technically, in 2019, month 3 for product 1 there is a units_prev of 8911 - but you don't give it in your expected results – Andrew Nov 19 '18 at 11:07
2

year month

Use cross join to generate the rows, left join to bring in the data and then lag() to get the "previous" value:

select y.year, m.month, p.product, t.units,
       lag(t.units) over (partition by p.product, m.month order by y.year) as prev_units
from (select distinct year from t) y cross join
     (select distinct month from t) m cross join
     (select distinct product from t) p left join
     t
     on t.year = y.year and t.month = m.month and t.product = p.producct;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I like its simplicity but it should be improved. It produces three nulls in the product column on my data. In my expected results there should be no rows for the months of the product 2 where there is no sales in neither 2017 and 2018. I think your Cartesian would produce rows for that. Can you please put aliases in the second line of your code? – Przemyslaw Remin Nov 19 '18 at 12:10
  • @PrzemyslawRemin . . . Yes, I see that you also want the `product` to be cross joined as well. The query is fixed. – Gordon Linoff Nov 19 '18 at 13:13
  • From the expected results shown, I gather they are rather looking for `(select distinct year from t) y cross join (select distinct month, product from t) mp`. – Thorsten Kettner Nov 19 '18 at 13:26
  • @GordonLinoff Almost done. Is additional `where coalesce(units,prev_units) is not null` the best way to get your query to desired results? I do not want the rows where there are no sales for this and previous year. The are some products that have sales only in specific months of the year. Since source table is couple of milion rows I do not want the weeds at once if possible. – Przemyslaw Remin Nov 19 '18 at 13:39
  • @PrzemyslawRemin . . . Yes, I think that will do it. – Gordon Linoff Nov 19 '18 at 14:00
  • I am scared of the lag function. I read that its performance is worse then simple self-join. Since your idea is based on Cartesian product and then clearance of the empty rows (no sales for current and previous year) then I think the lag function could be replaced with another self-left join: `left join t as t_prev on t_prev.year+1 = y.year and t_prev.month = m.month and t_prev.product = p.producct`. Do you agree? – Przemyslaw Remin Nov 20 '18 at 09:52
  • @PrzemyslawRemin . . . Performance of lag should not, in general, be worse than a self-join. You can, of course, try both versions to see which works better on your data. – Gordon Linoff Nov 20 '18 at 13:00
1

I would go with LAG, and a calendar table.

SELECT C.[Year],
       C.[Month],
       YPT.product,
       YST.units,
       YST.LAG(YST.units) OVER (PARTITION BY YTP.[product],C.[month] ORDER BY C.[year]) AS UnitsPrev
FROM CalendarTable C
     CROSS JOIN YourProductTable YPT
     LEFT JOIN YourSourceTable YST ON C.[Year] YST.[Year]
                                  AND C.[Month] = YST.[Month]
                                  AND YPT.Product = YST.Product
WHERE C.[day] = 1
  AND C.[date] BETWEEN {SomeStartDate} AND {SomeEndDate];

This guessing a little on your design (it assumes you have a product table).

Thom A
  • 88,727
  • 11
  • 45
  • 75
  • 1
    I do not have [Day]. – Przemyslaw Remin Nov 19 '18 at 10:57
  • @PrzemyslawRemin `C.[Day] = 1` not `YST.[Day]` – Thom A Nov 19 '18 at 11:01
  • If, however, you mean that your Calendar Table doesn't have a `day` column, @PrzemyslawRemin my question is why? That's a very important column to have in your calendar table, and I would suggest you make changes to it to include it. – Thom A Nov 19 '18 at 11:18
  • I do not have [Day] column. Simply because I work on monthly data. My [Units] column is aggregated to full months. – Przemyslaw Remin Nov 19 '18 at 12:14
  • So, @PrzemyslawRemin ,in your calendar table, are you only storing the first day of the month? If so, then just remove `AND C.[Day] = 1` from the `ON` clause. If not, then why isn't `day` in your calendar table if your have a calendar table with every day of the month in it. However, based on your above statement, why does your **calendar** table NOT have every **calendar** day in it? That's why it's called a **calendar** table. – Thom A Nov 19 '18 at 12:17
  • I did not say that I have a calendar table at all. Just the source table you see above. Though I do not deny it might be a smart move to introduce calendar table. – Przemyslaw Remin Nov 19 '18 at 12:22
  • Then I don't understand your comments still @PrzemyslawRemin . Like I said it's `C.[Day] = 1`. `C` is an alias for `CalendarTable`. It makes no reference to `YourSourceTable`. It doesn't matter that your table doesn't have a `[day]` column, I don't try to make use of one. – Thom A Nov 19 '18 at 12:25
1

You could generate all possible combinations for year, month and product in your data using CROSS JOIN. A simple LEFT JOIN will give you the value or NULL if data for a specific combination exists.

DECLARE @t TABLE (year int, month int, product int, unit int);
INSERT INTO @t VALUES
(2017, 1, 1, 1721),
(2017, 2, 1, 4915),
(2017, 5, 1, 5230),
(2018, 2, 1, 5216),
(2018, 3, 1, 8911),
(2017, 4, 2, 2933),
(2018, 1, 2, 7672);

SELECT ally.year, allm.month, allp.product, curr.units, prev.units AS units_prev
FROM (SELECT DISTINCT year FROM @t) AS ally
CROSS JOIN (SELECT DISTINCT product FROM @t) AS allp
CROSS JOIN (SELECT DISTINCT month FROM @t) AS allm
LEFT JOIN @t AS curr ON curr.year = ally.year AND curr.product = allp.product AND curr.month = allm.month
LEFT JOIN @t AS prev ON prev.year = ally.year - 1 AND prev.product = allp.product AND prev.month = allm.month

Result:

| year | month | product | units | units_prev |
|------|-------|---------|-------|------------|
| 2017 | 1     | 1       | 1721  | NULL       |
| 2017 | 2     | 1       | 4915  | NULL       |
| 2017 | 3     | 1       | NULL  | NULL       |
| 2017 | 4     | 1       | NULL  | NULL       |
| 2017 | 5     | 1       | 5230  | NULL       |
| 2017 | 1     | 2       | NULL  | NULL       |
| 2017 | 2     | 2       | NULL  | NULL       |
| 2017 | 3     | 2       | NULL  | NULL       |
| 2017 | 4     | 2       | 2933  | NULL       |
| 2017 | 5     | 2       | NULL  | NULL       |
| 2018 | 1     | 1       | NULL  | 1721       |
| 2018 | 2     | 1       | 5216  | 4915       |
| 2018 | 3     | 1       | 8911  | NULL       |
| 2018 | 4     | 1       | NULL  | NULL       |
| 2018 | 5     | 1       | NULL  | 5230       |
| 2018 | 1     | 2       | 7672  | NULL       |
| 2018 | 2     | 2       | NULL  | NULL       |
| 2018 | 3     | 2       | NULL  | NULL       |
| 2018 | 4     | 2       | NULL  | 2933       |
| 2018 | 5     | 2       | NULL  | NULL       |
Salman A
  • 262,204
  • 82
  • 430
  • 521
  • Hi Salman, (1) you do not have [year] column in your final result. (2) I do not understand the clause `WHERE year = 2018 OR year = 2018 - 1` (it means select *). (3) [Year] column values are exemplary in source table. There are [Year] as 2016, 2015.... Please do not refer to values. (4) Can you please explain how your Cartesian product produces just 10 rows instead of 20 (see. answer of Gordon Linoff)? – Przemyslaw Remin Nov 20 '18 at 09:09
  • Sorry I thought we're talking about two years only. It could be made to work with n years easily but `LAG` is better than two `LEFT JOINS`. – Salman A Nov 20 '18 at 09:20
  • Why do you think `LAG` is better then two `LEFT JOINS`? The performance seems to be in favor of self-join: https://dba.stackexchange.com/questions/158374/performance-comparison-between-using-join-and-window-function-to-get-lead-and-la – Przemyslaw Remin Nov 20 '18 at 09:41
  • It depends. But I've revised my answer nevertheless. It is now exactly the same as the other except that it uses LEFT JOIN to find previous row. Plan looks slightly better. – Salman A Nov 20 '18 at 09:44
0

If you want the rows where nothing was sold in both 2017 and 2018 as in your desired results for March 2017 as well you need to generate out the month, year and join in the product to get the null values.

This query does it for the month and year, hopefully you should be able to add the Product as well if required

DECLARE @startMonth INT=1
DECLARE @endMonth INT=12
DECLARE @startYear INT=2017
DECLARE @endYear INT=2018
;
WITH months AS (
    SELECT @startMonth AS m
    UNION ALL
    SELECT m+1 FROM months WHERE m+1<=@endMonth
),
years AS (
    SELECT @startYear AS y
    UNION ALL
    SELECT y+1 FROM years WHERE y+1<=@endYear
),
monthYears AS (
    SELECT m, y
    FROM months, years
)
SELECT  thisYear.[Year], thisYear.[Month], thisYear.[Product], thisYear.[Units], prevYear.[Units] as units_prev
FROM 
    (SELECT [Product], my.y as [Year], my.m as [Month], [Units]
    FROM monthYears my
    LEFT JOIN sales on my.m = [Month] and my.y = [Year]) as thisYear
LEFT OUTER JOIN     
    (SELECT [Product], my.y as [Year], my.m as [Month], my.y + 1 as NextYear, [Units]
    FROM monthYears my
    LEFT JOIN sales on my.m = [Month] and my.y = [Year])  as prevYear 
    on thisYear.Product = prevYear.Product
        and (thisYEAR.[Year]) = prevYear.[NextYear]
        and thisYEAR.[Month] = prevYear.[Month]
ORDER BY thisYear.[Year], thisYear.[Month], thisYear.[Product] 
option (maxrecursion 12);
PhilS
  • 624
  • 3
  • 5