-1

I got a code that uses a lot of left join with many tables. When I run this code, it takes more than an hour to run and at the end it gives error with Sort Execution Failure. So, I am thinking of breaking down that left join in multiple steps but I am not sure how to do it and need your help.

The code is as:

Proc sql;
create table newlib.Final_test as 
SELECT 
POpener.Name as Client,
Popener.PartyId as Account_Number,
Case
  When BalLoc.ConvertedRefNo NE '' then BalLoc.ConvertedRefNo
else BalLoc.Ourreferencenum
End as LC_Number,
BalLoc.OurReferenceNum ,
BalLoc.CnvLiabilityCode as Liability_Code,
POfficer.PartyID as Officer_Num,
POfficer.Name as Officer_Name,
POpener.ExpenseCode,
BalLoc.IssueDate as Issue_Date format=mmddyy10.,
BalLoc.ExpirationDate AS Expiry format=mmddyy10.,
BalLoc.LiabilityAmountBase as Total_LC_Balance,
Case
  When BalLoc.Syndicated = 0 Then BalLoc.LiabilityAmountBase
    else 0
End as SunTrust_Non_Syndicated_Exposure,
Case 
  When BalLoc.Syndicated = 1 and BalLoc.PartOutGroupPkey NE 0 Then    
BalLoc.LiabilityAmountBase
    else 0
  End as SunTrust_Syndicated_Exposure,
Case 
  When BalLoc.Syndicated = 1 and BalLoc.PartOutGroupPkey NE 0 Then   
(BalLoc.LiabilityAmountBase - (BalLoc.LiabilityAmountBase *   
(PParty.ParticipationPercent/100)))
  Else BalLoc.LiabilityAmountBase 
End as SunTrust_Exposure,
Case
  When BalLoc.Syndicated = 1 and BalLoc.PartOutGroupPkey <> 0 Then   
(BalLoc.LiabilityAmountBase  * PParty.ParticipationPercent/100)
  Else 0
End as Exposure_Held_By_Other_Banks,
PBene.Name as Beneficiary_Trustee,
cat(put(input(POpener.ObligorNumber,best10.),z10.),put(input 

   (BalLoc.CommitmentNumber,best10.),Z10.)) as Key,
case
when BalLoc.BeneCusip2 NE ' ' then catx 
('|',Balloc.BeneCusip,Balloc.BeneCusip2)
else BalLoc.BeneCusip
End as Cusip,
Case 
  when balLoc.OKtoExpire = 1 then '0' 
  when balLOc.OKtoExpire=0 and BalLoc.AutoExtTermDays NE 0 then put  
(Balloc.AutoExtTermDays,z3.)
  when balLoc.OKtoExpire=0 and BalLoc.AutoExtTermsMonth NE 0 then put  
(balloc.AutoExtTermsMonth,z3.)
  else '000'
End as Evergreen
Case 
when blf.AnnualRate NE 0 then put(blf.AnnualRate,z7.)
when blf.Amount NE 0 then cats('F',put(blf.amount,z7.))
else 'WAIVE'
End as Pricing,

FROM BalLocPrimary BalLoc
Left JOIN Party POpener on POpener.Pkey = BalLoc.OpenerPkey
Left join PartGroup PGroup on BallOC.PartOutGroupPkey = PGroup.pKey
Left join PartParties PParty ON PGroup.pKey = PParty.PartGroupPkey and   
PParty.ParticipationPercent > 0 and
PParty.combined in
(select PPartParties.All_combined  
from PPartParties /*group by PartGroupPkey, PartyPkey*/)

Left Join MemExpenseCodes ExpCodes on POpener.ExpenseCode = ExpCodes.Code
Left JOIN Party PBene on PBene.Pkey = BalLoc.BenePkey
Left join Party POfficer on POfficer.Pkey = BalLoc.AccountOfficerPkey 
left join maxfee on maxfee.LocPrimaryPkey = BalLoc.LocPrimaryPkey
left join BalLocFee BLF on BLF.Pkey = maxfee.pkey
Where BalLoc.LetterType not in ('STBA','EXPA', 'FEE',' ') and 
 BalLoc.LiabilityAmountBase > 0 and BalLoc.irdb = 1
;
quit;

Thank you,

Shankar

shankar
  • 63
  • 1
  • 8
  • Hard to say how to improve it without some statistics. How big are these tables? Are they indexed on the join keys? Where's the SELECT part of the query? – Joe Jul 07 '17 at 14:26
  • @Joe; I just added the whole code including the select statement. these tables have rows between 75,000 to 650,000 and the column between 10 to 40. – shankar Jul 07 '17 at 14:38

1 Answers1

0

A few things I would suggest:

1, for each dataset that you reference, keep only the variables you need to join on, or which get used in the SELECT statement. E.g., from your Party dset, it looks like you only need the Pkey field and Name. Therefore when you make your join to that dset, you should use:

Left JOIN Party(keep=Pkey Name) PBene on PBene.Pkey = BalLoc.BenePkey

2, Push your WHERE statement into the FROM statement like so:

FROM BalLocPrimary(where=(LetterType not in ('STBA','EXPA', 'FEE',' ') and 
 LiabilityAmountBase > 0 and irdb = 1)) BalLoc

And make sure the conditions are in the order of most common to least (barring any index that might be on those 3 fields)

3, You are driving off the BalLocPrimary dataset, left joining to everything else. Is that what you really intend? Is it OK that your result set comes back without a Client or Account_Number? Left Joins can be computationally expensive, and the more you can minimize them, the better.

4, Joe asked about indexes on the join fields. You probably should have some. I have found myself referencing this SUGI paper regularly enough to bookmark it. Similarly, you could review the EXPLAIN PLAN from the query to see where it might be bottlenecking. Another SUGI paper would be a good start.

5, You're right that this could (should?) be broken up into multiple steps. That's a good intuition. However the optimal breaks are going to be highly depending on the underlying data, index, and the join paths. So it's hard to prescribe that from the other side of the screen. I think that second paper I linked could give you some good tips on optimization for your specific case.

Amw 5G
  • 659
  • 5
  • 16