Building data sets from multiple multivalue records using applys with missing values

Question

I have a SQL Server 2012 database which was imported from a multi-value environment which causes me more headaches than I care to count, however it is what it is and I have to work with it.

I am trying to build a data set using these multi value records but have hit a stumbling bock. This is my scenario

I have a custom split string TVF that splits a string of "Test,String" into

Rowno | Item
------+---------
1     | Test
2     | String

I have the following data:

Clients Table

Ref | Names     | Surname   | DOB                 | IdNo
----+-----------+-----------+---------------------+------    
123 |John,Sally |Smith     | DOB1,DoB2            | 45,56
456 |Dave,Paul  |Jones,Dann| DOB1,DOB2            | 98
789 |Mary,Moe,Al|Lee       | DOB1                 | NULL

What I need to create is a data set that looks like this:

Ref | Names     | Surname   | DOB                 | IdNo
----+-----------+-----------+---------------------+------    
123 | John      | Smith     | DOB1                | 45
123 | Sally     | Smith     | DOB2                | 56
456 | Dave      | Jones     | DOB1                | 98
456 | Paul      | Dann      | DOB2                |
789 | Mary      | Lee       | DOB1                | 
789 | Moe       | Lee       |                     | 
789 | Al        | Lee       |                     |

In the past, to solve similar issues, I would tackle this using a query like this:

SELECT 
    Ref
    , SplitForenames.ITEM names
    , SplitSurname.ITEM Surname
    , SplitDOB.ITEM dob
    , SplitNI.ITEM ID
FROM
    Clients
CROSS APPLY 
    dbo.udf_SplitString(Names, ',') SplitForenames
OUTER APPLY 
    dbo.udf_SplitString(Surname, ',') SplitSurname
OUTER APPLY 
    dbo.udf_SplitString(DOB, ',') SplitDOB
OUTER APPLY 
    dbo.udf_SplitString(ID, ',') SplitNI
WHERE
    SplitSurname.RowNo = SplitForenames.RowNo
    AND SplitDOB.RowNo = SplitForenames.RowNo
    AND SplitNI.RowNo = SplitForenames.RowNo    
ORDER BY 
    REF;

However due to there being examples of differences between the number of surnames to forenames and missing DOB and ID fields i cannot match them in this way.

I need to match where there is a match then otherwise be blank for DOB and ID and use the first instance of the surname. I am just stuck as to how to achieve this.

Anyone have any suggestions as to how i can create my required data-set from the original source.

Thanks in advance

Could you explain what are the condition how to assign DOB values? When you split and when you use all and when should be empty ? — pi.314, Aug 31 '18 at 13:34
Of course...So for each value in DOB (and ID) should be unique for each First name. Where the instances of values in either of these columns do not match the number of first names then each value should be assigned to each first name from left to right. Only for the surname should the value be applied to all first names if unique values aren't provided for each. If no value is provided for ID or DOB the should be blank for all Names. Hope this helps — Brenns1, Aug 31 '18 at 14:05
@ P. ejH that was a mistake that has now been corrected,thank-you for pointing out — Brenns1, Aug 31 '18 at 14:20

score 1 · Answer 1 · answered Aug 31 '18 at 11:42

1

I think you can handle this using subqueries and doing the RowNo comparison before the OUTER APPLY:

FROM Clients c CROSS APPLY
     dbo.udf_SplitString(Names, ',') SplitForenames OUTER APPLY 
     (SELECT . . .
      FROM dbo.udf_SplitString(Surname, ',') SplitSurname
      WHERE SplitSurname.RowNo = SplitForenames.RowNo
     ) SplitSurname OUTER APPLY
     (SELECT . . .
      FROM dbo.udf_SplitString(DOB, ',') SplitDOB
      WHERE SplitDOB.RowNo = SplitForenames.RowNo
     ) SplitDOB OUTER APPLY
     (SELECT . . .
      FROM dbo.udf_SplitString(DOB, ',') SplitNI
      WHERE SplitNI.RowNo = SplitForenames.RowNo
     ) SplitNI

answered Aug 31 '18 at 11:42

Gordon Linoff

1,242,037
58
646
786

Unless i'm missing something i cant reference the previous sub query with each apply sub-query – Brenns1 Aug 31 '18 at 12:46
@Brenns1 . . . Actually, you can. But just to be clear, these are referencing from `SplitForNames` rather than the previous subquery. Do you find that this doesn't work? – Gordon Linoff Aug 31 '18 at 12:48
Thanks for your quick responses, I cannot use the where within the Outer Apply SubQuery as I cannot reference SplitForenames from within the sub query, putting the where clause outside the subquery is the where clause for the entire query not allowing any further outer applys afterwards... – Brenns1 Aug 31 '18 at 12:59

score 1 · Accepted Answer · answered Aug 31 '18 at 14:02

I cant find what are the condition of DOB column to be split or not. However: with the Split function SpliF as below:

CREATE FUNCTION SplitF(@str AS NVARCHAR(max))
RETURNS @People TABLE
(Rowno INT,Item NVARCHAR(10))
AS
BEGIN
    DECLARE @i INT, @pos INT
    DECLARE @subname NVARCHAR(max)
    SET @I = 0;
    WHILE(LEN(@str)>0)
    BEGIN
        SET @pos = CHARINDEX(',',@str)
        IF @pos = 0 SET @pos = LEN(@str)+1
        SET @subname = SUBSTRING(@str,1,@pos-1)
        SET @str = SUBSTRING(@str, @pos+1, len(@str))
        SET @i = @i + 1
        INSERT INTO @People VALUES (@i, @subname)
    END
    RETURN
END
GO
select * from SplitF('test,my,function')

Rowno       Item
----------- ----------
1           test
2           my
3           function

and basic data:

select Ref, Names, Surname, DOB, IdNo into #clients
from (  select 123 as Ref, 'John,Sally' as Names, 'Smith' as Surname,
                 'DOB1,DOB2' as DOB, '45,56' as IdNo
        union all select 456, 'Dave,Paul','Jones,Dann','DOB1,DOB2', '98'
        union all select 789, 'Mary,Moe,Al', 'Lee', 'DOB1', NULL) A

select * from #clients

Ref         Names       Surname    DOB       IdNo
----------- ----------- ---------- --------- -----
123         John,Sally  Smith      DOB1,DOB2 45,56
456         Dave,Paul   Jones,Dann DOB1,DOB2 98
789         Mary,Moe,Al Lee        DOB1      NULL

using below code you will get such results:

select
    Ref,
    RTrim(S_NAM.Item) as Names,
    coalesce(S_SURNAM.Item,S_SURNAM_LAST.Item) AS Surname,
    coalesce(split_dob.Item, '') as DOB,
    coalesce(split_IdNo.Item,'') as IdNo
from
    #clients MAIN
    outer apply(select Rowno, Item from SplitF(MAIN.Names)) as S_NAM
    outer apply(select top 1 Item from SplitF(MAIN.Surname) where Rowno = S_NAM.Rowno) as S_SURNAM
    outer apply(select top 1 Item from SplitF(MAIN.Surname) order by Rowno desc) as S_SURNAM_LAST
    outer apply(select top 1 Item from SplitF(MAIN.IdNo) where Rowno = S_NAM.Rowno) as split_IdNo
    outer apply(select top 1 Item from SplitF(MAIN.DOB) where Rowno = S_NAM.Rowno) as split_dob
order by MAIN.Ref, S_NAM.Rowno

Ref         Names      Surname    DOB        IdNo
----------- ---------- ---------- ---------- ----------
123         John       Smith      DOB1       45
123         Sally      Smith      DOB2       56
456         Dave       Jones      DOB1       98
456         Paul       Dann       DOB2       
789         Mary       Lee        DOB1       
789         Moe        Lee                   
789         Al         Lee

With the addition of the correct sub queries for each apply and the coalesces I have been able to replicate your test and get that data required. Thankyou for your assistance! — Brenns1, Aug 31 '18 at 15:03

Building data sets from multiple multivalue records using applys with missing values

2 Answers2