0

I have a set of data containing invoice information. The goal is to find the credit and rebill invoice number ('InvcNbrKey') associated with a particular 'Standard Order' line. Think of it as a person orders an item which generate a line with 'OrdTypeName' = 'Standard Order', but later it was credited ('Rebill CM Req')and rebilled ('Rebill DM Req').

I attempted solving this problem by using Row_Number over Partition to isolate all transactions that share the same NDC, CustPO# and Location to put these transactions into the order which is described above: "Standard Order" followed by 'Rebill CM Req' then 'Rebill DM Req' so that I can eventually use Lead Function to return the correct Invoice Number for the credit

Select ROW_NUMBER() over (Partition by NDC,[CustPO#],[Location] Order by NDC,CustPO#,OrdNbrKey) as RowNumber
,*
From Invoices

Then this two lead functions to find the invoice and the date associated:

        Case when (Lead([NetSales$],1) over (Partition by  NDC,[CustPO#],[Location] Order by NDC asc, CustPO#) = [NetSales$]*(-1))                
        
        Then concat('Credited with Invoice #',Lead([InvcNbrKey],1)  over (Partition by  NDC,[CustPO#],[Location] Order by [InvcDateKey]),' on '
            ,Lead([InvcDateKey],1)  over (Partition by  NDC,[CustPO#],[Location] Order by [InvcDateKey]) )  
        
        Else '' END as [HasThisBeenCredited]
    ,Case when (Lead([OrdTypeName],2) over (Partition by  NDC,[CustPO#],[Location] Order by NDC asc, CustPO#) = 'Rebill DM Req')

        Then concat('Rebilled with Invoice #',Lead([InvcNbrKey],2)  over (Partition by  NDC,[CustPO#],[Location] Order by [InvcDateKey]),' on '
            ,Lead([InvcDateKey],2)  over (Partition by  NDC,[CustPO#],[Location] Order by [InvcDateKey]) )  
        
        Else '' END as [HasThisBeenRebilled]

'''

It looks like this enter image description here

But my desired return is this: enter image description here

If my desired return is achieved, then I use lead function to return this information: enter image description here

The problem is I cannot produce the desired return. I had great success using this approach if there are only three lines after the isolation using Partition by. However, when there are 6 lines. The problem is with putting the rows in the particular sequence described. Maybe this problem can be solved using an entire different approach rather than using row_number over partition by and lead function.

Any suggestion is appreciated.

Here is the sql for creating this table

INSERT INTO Invoices (InvoiceDate,OrdNbrKey,OrdTypeName,CustPO#,NDC,InvcNbrKey,InvoicePrice,HistoricWAC) VALUES ('1/25/2023','4608548197','Standard Order','MH01252023','50419039501','3121043683','7881.5','7881.5');
INSERT INTO Invoices (InvoiceDate,OrdNbrKey,OrdTypeName,CustPO#,NDC,InvcNbrKey,InvoicePrice,HistoricWAC) VALUES ('1/26/2023','4608548218','Standard Order','MH01252023','50419039501','3121044236','7881.5','7881.5');
INSERT INTO Invoices (InvoiceDate,OrdNbrKey,OrdTypeName,CustPO#,NDC,InvcNbrKey,InvoicePrice,HistoricWAC) VALUES ('4/13/2023','5148834452','Rebill CM Req','MH01252023','50419039501','352040045','7881.5','-7881.5');
INSERT INTO Invoices (InvoiceDate,OrdNbrKey,OrdTypeName,CustPO#,NDC,InvcNbrKey,InvoicePrice,HistoricWAC) VALUES ('4/13/2023','5148834454','Rebill DM Req','MH01252023','50419039501','352040046','8428.26','8428.26');
INSERT INTO Invoices (InvoiceDate,OrdNbrKey,OrdTypeName,CustPO#,NDC,InvcNbrKey,InvoicePrice,HistoricWAC) VALUES ('4/13/2023','5148837317','Rebill CM Req','MH01252023','50419039501','352043059','7881.5','-7881.5');
INSERT INTO Invoices (InvoiceDate,OrdNbrKey,OrdTypeName,CustPO#,NDC,InvcNbrKey,InvoicePrice,HistoricWAC) VALUES ('4/13/2023','5148837318','Rebill DM Req','MH01252023','50419039501','352043060','12866.4','12866.4');
Dale K
  • 25,246
  • 15
  • 42
  • 71
Thien Pham
  • 43
  • 1
  • 9
  • It looks like, you also want to take in account the `OrdTypeName` in the `PARTITION`. Also I can't see column `Location` in your output. Maybe it could help, if you change the partition part to `(PARTITION BY NDC,[CustPO#],[Location],[OrdTypeName] ORDER BY NDC,CustPO#,OrdNbrKey,[OrdTypeName])`. – Raphi1 Apr 30 '23 at 15:35
  • Thanks for your comment, Raphael. Location is almost certainly a non-issue here but I left it in my code to be abundantly cautious. Regarding, OrdTypeName, if I add this, then I cannot group in the transactions that are credits and debit related to the "Standard Order", in which case, i would not be able to use the lead function the way it is written. Still, the goal is to get the desired result in the screenshot with row number 1, 2, 3 in the right spot. I suspect there is an entirely different approach to this problem but I am a bit new to SQL to be able to tell. – Thien Pham Apr 30 '23 at 16:05
  • I think, with the current data, it's not possible. You can never be sure, which `Rebill` is assigned to which `Standard Order`, if there is more than one (three entries). I think, you need some link between them three entries. However, then you don't need the query at all because then you know, how the are related. How do you know in the exiting example, if `4608548218` is not credited by `352040045` instead of `352043059`? The amount also is the same here, since you're only ordering by the invoice date and all the `Rebill` have the same. Or are there some relations in the data I don't see? – Raphi1 Apr 30 '23 at 16:53
  • That's why one should "bind" credits to invoices using some sort of extra field and not trying to figure it out runtime. – siggemannen Apr 30 '23 at 17:27
  • I agree with you. What if the best we could do is to return two possibilities that this could be credited by either 352040045 or 352043059 and debited by either 352040046 or 352043060? Would you have an idea how i would attempt this? – Thien Pham Apr 30 '23 at 18:30
  • You can achieve this by writing `SELECT InvcNbrKey FROM Invoices WHERE OrdTypeName = 'Rebill CM Req' GROUP BY NDC,[CustPO#],InvcNbrKey`. Just change the `OrdTypeName` for the other value – Raphi1 Apr 30 '23 at 18:35
  • @Raphael, I will have to think about integrating your suggestion into the current script. I am trying to imagine how this could be placed in the first two columns of the final desired return 'HasThisBeenCredited' and 'HasThisBeenDebited' in place of the current 'Case When' structure. Do you see a solution where I can force the issue to reflect the desired sorting (row numbers 1,2,3,1,2,3) without regard for which InvcNbrKey they might be as long as it is one of the two 'Rebill CM" and 'Rebill DM' set? – Thien Pham Apr 30 '23 at 19:24

1 Answers1

0

You can try something like this:

SELECT * FROM
(
  SELECT
  ROW_NUMBER() over (Partition by a.NDC,a.[CustPO#], i1.[OrdNbrKey] ORDER BY i1.InvoiceDate, a.OrdNbrKey) as RowNumber,
  a.InvoiceDate,
  a.OrdNbrKey,
  a.InvcNbrKey,
  a.InvoicePrice,
  a.HistoricWAC,
  a.OrdTypeName,
  a.NDC,
  a.[CustPO#]
  FROM Invoices AS i1
  INNER JOIN (
    SELECT * FROM Invoices
  ) AS a ON i1.NDC = a.NDC AND i1.[CustPO#] = a.[CustPO#] AND (i1.[OrdNbrKey] = a.[OrdNbrKey] OR i1.[OrdTypeName] <> a.[OrdTypeName])
  WHERE i1.OrdTypeName = 'Standard Order'
) AS b WHERE b.RowNumber < 4

You can play a little with the conditions, to see all the data. This will most likely not produce the right data for you, as now the Credit and Debit is the same value for both orders. But as mentioned before, since they are not linked, it's not possible, to link the correct ones.

For the explanation of the query, you can see the self join here. If you would just select all columns from the inner select, you can see, i1.OrdNbrKey will produce our missing to partition by. The inner where condition will make sure, we get our own Standard Order but not the others, and the other types.

If you remove the most outer where condition, you will get a group of row numbers from 1 to 5, as it will produce one Standard Order and two Rebill CM and Rebill DM each. Hopefully this will help you.

i1 is our hidden table, just for grouping/partitioning/ordering

Result table:

RowNumber InvoiceDate OrdNbrKey InvcNbrKey InvoicePrice HistoricWAC OrdTypeName NDC CustPO#
1 2023-01-25 4608548197 3121043683 7881.5 7881.5 Standard Order 50419039501 MH01252023
2 2023-04-13 5148834452 352040045 7881.5 -7881.5 Rebill CM Req 50419039501 MH01252023
3 2023-04-13 5148834454 352040046 8428.26 8428.26 Rebill DM Req 50419039501 MH01252023
1 2023-01-26 4608548218 3121044236 7881.5 7881.5 Standard Order 50419039501 MH01252023
2 2023-04-13 5148834452 352040045 7881.5 -7881.5 Rebill CM Req 50419039501 MH01252023
3 2023-04-13 5148834454 352040046 8428.26 8428.26 Rebill DM Req 50419039501 MH01252023

Without outer WHERE clause

RowNumber InvoiceDate OrdNbrKey InvcNbrKey InvoicePrice HistoricWAC OrdTypeName NDC CustPO#
1 2023-01-25 4608548197 3121043683 7881.5 7881.5 Standard Order 50419039501 MH01252023
2 2023-04-13 5148834452 352040045 7881.5 -7881.5 Rebill CM Req 50419039501 MH01252023
3 2023-04-13 5148834454 352040046 8428.26 8428.26 Rebill DM Req 50419039501 MH01252023
4 2023-04-13 5148837317 352043059 7881.5 -7881.5 Rebill CM Req 50419039501 MH01252023
5 2023-04-13 5148837318 352043060 12866.4 12866.4 Rebill DM Req 50419039501 MH01252023
1 2023-01-26 4608548218 3121044236 7881.5 7881.5 Standard Order 50419039501 MH01252023
2 2023-04-13 5148834452 352040045 7881.5 -7881.5 Rebill CM Req 50419039501 MH01252023
3 2023-04-13 5148834454 352040046 8428.26 8428.26 Rebill DM Req 50419039501 MH01252023
4 2023-04-13 5148837317 352043059 7881.5 -7881.5 Rebill CM Req 50419039501 MH01252023
5 2023-04-13 5148837318 352043060 12866.4 12866.4 Rebill DM Req 50419039501 MH01252023
Raphi1
  • 322
  • 1
  • 9
  • Thank you for showing me in great detail your thoughts. I kind of like the query WITH the 'Where' clause but as you pointed out, no differentiation between the 1st set of row 2,3 and the 2nd set of row 2,3 - as in, there is no awareness that there is a second set of CM and DM exists. W/out outer 'WHERE', Invoices '352043059' and '352043060' remains but also create a duplication on each. I can overcome this by a lead function with proper offset (4 and 5), but the trouble is I don't know to write a dynamic lead function if same scenario has 3+ sets of OrdTypeName rather than 2. – Thien Pham Apr 30 '23 at 22:43
  • Even if you could write a dynamic lead function, you don't know, which pair of CM and DM is the correct one. Select * from the inner query and try to find some combinations/criteria of columns, from which you can determine the correct pair. – Raphi1 Apr 30 '23 at 22:57