1

Assume I have historical data that lists items which show defects each day. Items drop in and out of this list. I.E.-

TABLE NAME: ITEMS_WITH_DEFECTS

DAY        | ITEMID
-------------------
01-JAN-16  | A
01-JAN-16  | D
02-JAN-16  | B
02-JAN-16  | D
03-JAN-16  | A
03-JAN-16  | C
04-JAN-17  | A
04-JAN-17  | D

I'd like to build a data source that shows for each day relative to the previous day, how many items are new, have dropped, and have carried over. It'd like to do this without losing any information from my history data source, so my desired output is:

TABLE NAME: ITEM_DEFECT_TRENDS

DAY       | ITEMID | DEFECT | TREND
------------------------------------
01-JAN-16 | A      | y      | New
01-JAN-16 | B      | n      | (null)
01-JAN-16 | C      | n      | (null)
01-JAN-16 | D      | y      | New
02-JAN-16 | A      | n      | Dropped
02-JAN-16 | B      | y      | New
02-JAN-16 | C      | n      | (null)
02-JAN-16 | D      | y      | Carryover
03-JAN-16 | A      | y      | New
03-JAN-16 | B      | n      | Dropped
03-JAN-16 | C      | y      | New
03-JAN-16 | D      | n      | Dropped
04-JAN-16 | A      | y      | Carryover
04-JAN-16 | B      | n      | (null)
04-JAN-16 | C      | n      | Dropped
04-JAN-16 | D      | y      | New

I know how to produce the TREND column, but I don't know how to add the rows or DEFECT column to the output.

Is this something I can do with SQL or PL SQL?

The reason I'd like to model the data like this is because my database contains a large list of about 12,000 possible items, but only 500 or so items will show up with defects at a time. It's more memory efficient to scope this with only the items I have to consider versus all of the possible items.

Ryan Barker
  • 113
  • 2
  • 11
  • SQL should suffice. You need a list of all `itemid` ("list" means a table that has all the `itemid`, ideally as a primary key or a unique, non-null key), since you must show E also, which has never had a defect in your `defects` table. (Otherwise your requirement is not very logical.) You also need to indicate over what time period this should be shown. If on a date the `defects` table shows no rows, that means there were no defects on that date - correct? Lastly: if you "dictate" a first day in your report, all the items will be considered "not to have had a defect on the prior date"? –  Mar 02 '17 at 00:49
  • I ask the last question because for A on January 1 you show New. How do you know it didn't also have a defect on December 31? If it should all be self-contained, it is "New" because the world only came into existence on January 1. Is that the position you need to take? –  Mar 02 '17 at 00:50
  • Thanks for the fast response. I should elaborate more on the defect history: It's a 14 day history ending at sysdate which updates daily. Of the set of all possible item IDs (which would include E), it usually has around 500 that show up at a time and 11,500 that never flag. The timeframe of the trending should match the timeframe of the input, and all items are assumed to not have had a defect prior to min(day) in the history. Since the trending only applies to a very small subset of items from my possible set, I only want to use items I absolutely need in the output. Does that make sense? – Ryan Barker Mar 02 '17 at 01:03
  • "New" just means "New" compared to the previous day. All items on the first day are "New". – Ryan Barker Mar 02 '17 at 01:04

1 Answers1

1

Yes, I think you can do this with SQL:

select d.day, i.itemid, 
       (case when id.itemid is not null then 'y' else 'n' end) as defect,
       (case when id.itemid is null and
                  lag(id.itemid) over (partition by i.itemid order by d.day) is null
             then 'New'
             when id.itemid is not null and
                  lag(id.itemid) over (partition by i.itemid order by d.day) is not null
             then 'CarryOver'
             when lag(id.itemid) over (partition by i.itemid order by d.day) is not null
             then 'Dropped'
        end) as trend
from (select distinct day from items_with_defects) d cross join
     (select distinct itemid from items_with_defects) i left join
     items_with_defects id
     on id.day = d.day and id.itemid = i.itemid;

The idea is to generate all the rows using a cross join -- if you have other ways of getting the population of days and items that you want, then use them.

Then left join the original data to check for matches. The defect column is easy. I think the trend column logic is correct, but the question does not fully explain it.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This is probably insufficient - you need to densify the data first. There may be a day on which no item had any defects. With the current query, that day wouldn't even show up in the output. –  Mar 02 '17 at 01:19
  • Gordon, thank you! I've only been using SQL since May 2016 and didn't know about cross-joins. Your logic for trending is fine - New means in current day but not previous, Carryover means in both days, and Dropped means in previous day but not current. @mathguy The procedure that builds the history table will place a single entry with the day and a (null) in the case you are describing. It ensures there is at least one record for every day. – Ryan Barker Mar 02 '17 at 01:29