1

Using SQL Server 2012 I have a table called Allbucket

CustodianAccountNum symbol  EndDate ManagerName MarketValue NetReturn
A9G040819   wabix   12/31/2013  GMO Benchmark   34751.10987 0.004072
A9G040819   wabix   1/31/2014   GMO Benchmark   34128.88767 -0.017905
A9G040819   wabix   2/28/2014   GMO Benchmark   49969.8081  0.0202
A9G040819   wabix   3/31/2014   GMO Benchmark   50370.993   0.008028
A9G040819   wabix   4/30/2014   GMO Benchmark   50995.0584  0.012389
A9G040819   amj 12/31/2013  JPMorgan Alerian    1234.55 -0.008154
A9G040819   amj 2/28/2014   JPMorgan Alerian    14849.76    -0.018599
A9G040819   amj 3/31/2014   JPMorgan Alerian    14892.8 0.015203
A9G040819   amj 4/30/2014   JPMorgan Alerian    15513.6 0.041684

I am trying to load this data from one system to another. However it requires per a given CustodianAccountNum, for all symbols to have the same date intervals during periods of time where they both exist.

Note that amj is missing 1/31/2014. The clue is that at least one other security, in this case wabix has that date during the same span of time. Also note sometimes dates are intramonth such as 1/15/2014

I was hoping to do something like a self join and partition where I take all possible distinct dates for a given CustodianAccountNum then force all rows to have the same periodicity during the span of time they overlap. For the interpolated rows that were not original and 'borrowed' from another symbol existing in that span of time I would like to pull the LAG marketvalue from the previous row of that symbol (if a previous row exists if not then 0) and force all other values to zero. There are additional columns in the original data but I'm trying to keep this example simple.

So Ideally AMJ would look like this because wabix has the 1/31/2014 date

CustodianAccountNum symbol  EndDate ManagerName MarketValue NetReturn
   A9G040819    amj 12/31/2013  JPMorgan Alerian    1234.55 -0.008154
A9G040819       amj 1/31/2014   JPMorgan Alerian    1234.55 -0.0
    A9G040819   amj 2/28/2014   JPMorgan Alerian    14849.76    -0.018599
    A9G040819   amj 3/31/2014   JPMorgan Alerian    14892.8 0.015203
    A9G040819   amj 4/30/2014   JPMorgan Alerian    15513.6 0.041684

the guiding principle of missing dates is if any other Symbol has a date paritioned by a given custodianaccountnum. There are thousands of different accountnums but they only need to align per a given accountnum

My concern is only with date gaps during the lifetime of the Symbol per account. If another symbol name exists years before it I don't need many months of 0 added. I just need them in sync from the first to the last date of a given symbol across all Symbols that overlap in time.

UPDATE

Gordon Linoff's reply gets me close but not quite there. I had to change OUTER APPLY to CROSS apply or I was getting thousands of null records in all columns.

I've amended the query to show all the desired columns however this query is resulting in all but market value being = 0. Basically I want to force all the values to 0 for the derived row (1/31/2014 in my example) except market vlaue which I'd like to pull from the previous market value. However for all non derived rows I want to use the original values across the entire row.

select 

ab.drank,d.EndDate,ab.BranchName,ab.EntityID,ab.CustodianAccountNum,ab.AccountID,ab.ManagerName,
ab.FTAssetStyle,ab.FTAssetClass,ab.PWMSecurityID,ab.AssetClassCode,ab.AssetClass,ab.Symbol,ab.SecType,
ab.Cusip,ab.Held,ab.MarketValue,
0 AS GrossFlow,0 AS GrossWeight,0 AS GrossReturn,0 AS NetFlow,0 AS NetWeight,
0 AS NetReturn,0 AS PortfolioFees,0 AS PortfolioExpenses,0 AS ManagerFees,0 AS Income

from (select distinct CustodianAccountNum, enddate from Allbucket) d join
   (select distinct CustodianAccountNum, symbol from Allbucket) s
   on d.CustodianAccountNum = s.CustodianAccountNum CROSS apply
   (select top 1 ab.*
   from Allbucket ab
   where d.CustodianAccountNum = ab.CustodianAccountNum and
      d.enddate <= ab.enddate and
      s.symbol = ab.symbol
            AND ab.CustodianAccountNum = 'A9G040819'
   order by d.enddate desc
   ) ab
Jay C
  • 842
  • 6
  • 17
  • 37

2 Answers2

0

You can generate the rows by essentially using a cross join. In this case, it is actually a join on the distinct dates and symbol for CustodianAccountNum, but it is still a Cartesian product.

Then, the most recent record for the combination of CustodianAccountNum, symbol, and EndDate can be chosen using outer apply.

The following is a slight variation. This uses left join to bring in the matching record and then uses information from the two records when there is no match. I'm not sure what columns should be 0, but the idea is:

select ab.CustodianAccountNum, ab.symbol, d.EndDate, ab.ManagerName,
       ab.MarketValue, 0 as NetReturn,
       ab.xxx,                      -- for columns that come from the current row
       coalesce(ab.yyy, abprev.yyy) -- for columns from the previous row
from (select distinct CustodianAccountNum, enddate from Allbucket) d join
     (select distinct CustodianAccountNum, symbol from Allbucket) s
     on d.CustodianAccountNum = s.CustodianAccountNum left join
     Allbucket ab
     on d.CustodianAccountNum = ab.CustodianAccountNum and
        d.enddate <= ab.enddate and
        s.symbol = ab.symbol outer apply
     (seleect top 1 ab.*
      from Allbucket ab
      where d.CustodianAccountNum = ab.CustodianAccountNum and
            d.enddate < ab.enddate and
            s.symbol = ab.symbol
      order by d.enddate desc
     ) abprev
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

A slightly different approach, but still using a Cartesian produce and an APPLY operator (OUTER APPLY is needed in this one). To get 0 where you do not want the prior value carried forward, just amend the COALESCE() accordingly.

SQL Fiddle

MS SQL Server 2014 Schema Setup:

CREATE TABLE Allbucket
    ([CustodianAccountNum] varchar(9), [symbol] varchar(5), [EndDate] datetime, [ManagerName] varchar(16), [MarketValue] numeric
     , [NetReturn] decimal(12,6))
;

INSERT INTO Allbucket
    ([CustodianAccountNum], [symbol], [EndDate], [ManagerName], [MarketValue], [NetReturn])
VALUES
    ('A9G040819', 'wabix', '2013-12-31 00:00:00', 'GMO Benchmark', 34751.10987, 0.004072),
    ('A9G040819', 'wabix', '2014-01-31 00:00:00', 'GMO Benchmark', 34128.88767, -0.017905),
    ('A9G040819', 'wabix', '2014-02-28 00:00:00', 'GMO Benchmark', 49969.8081, 0.0202),
    ('A9G040819', 'wabix', '2014-03-31 00:00:00', 'GMO Benchmark', 50370.993, 0.008028),
    ('A9G040819', 'wabix', '2014-04-30 00:00:00', 'GMO Benchmark', 50995.0584, 0.012389),
    ('A9G040819', 'amj', '2013-12-31 00:00:00', 'JPMorgan Alerian', 1234.55, -0.008154),
    ('A9G040819', 'amj', '2014-02-28 00:00:00', 'JPMorgan Alerian', 14849.76, -0.018599),
    ('A9G040819', 'amj', '2014-03-31 00:00:00', 'JPMorgan Alerian', 14892.8, 0.015203),
    ('A9G040819', 'amj', '2014-04-30 00:00:00', 'JPMorgan Alerian', 15513.6, 0.041684)
;

Query 1:

SELECT
      s.CustodianAccountNum
    , s.symbol
    , d.enddate
    , COALESCE(ab.ManagerName, ap.ManagerName) AS ManagerName
    , COALESCE(ab.MarketValue, ap.MarketValue) AS MarketValue
    , COALESCE(ab.NetReturn, 0) AS NetReturn
FROM (
      SELECT
            CustodianAccountNum
          , symbol
          , MIN(enddate) symstart
          , MAX(enddate) symend
      FROM Allbucket
      GROUP BY
            CustodianAccountNum
          , symbol
      ) s
      JOIN (
            SELECT DISTINCT
                  cast(enddate as date) as enddate
            FROM Allbucket
      ) d ON d.enddate BETWEEN s.symstart AND s.symend
      LEFT JOIN Allbucket ab ON s.CustodianAccountNum = ab.CustodianAccountNum
                  AND s.symbol = ab.symbol
                  AND ab.enddate = d.enddate
      OUTER APPLY (
            SELECT TOP 1
                  t.*
            FROM Allbucket t
            WHERE s.CustodianAccountNum = t.CustodianAccountNum
                  AND s.symbol = t.symbol
                  AND d.enddate <= t.enddate
            ORDER BY
                  d.enddate DESC
      ) ap

Results:

| CustodianAccountNum | symbol |    enddate |      ManagerName | MarketValue | NetReturn |
|---------------------|--------|------------|------------------|-------------|-----------|
|           A9G040819 |    amj | 2013-12-31 | JPMorgan Alerian |        1235 | -0.008154 |
|           A9G040819 |    amj | 2014-01-31 | JPMorgan Alerian |       14850 |         0 |
|           A9G040819 |    amj | 2014-02-28 | JPMorgan Alerian |       14850 | -0.018599 |
|           A9G040819 |    amj | 2014-03-31 | JPMorgan Alerian |       14893 |  0.015203 |
|           A9G040819 |    amj | 2014-04-30 | JPMorgan Alerian |       15514 |  0.041684 |
|           A9G040819 |  wabix | 2013-12-31 |    GMO Benchmark |       34751 |  0.004072 |
|           A9G040819 |  wabix | 2014-01-31 |    GMO Benchmark |       34129 | -0.017905 |
|           A9G040819 |  wabix | 2014-02-28 |    GMO Benchmark |       49970 |    0.0202 |
|           A9G040819 |  wabix | 2014-03-31 |    GMO Benchmark |       50371 |  0.008028 |
|           A9G040819 |  wabix | 2014-04-30 |    GMO Benchmark |       50995 |  0.012389 |

nb: You could use ISNULL() instead of COALESCE()

[EDITS] made correction for data type on NetValue, & change on enddate to date but that is optional

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51