0

I'm currently in a scenario where we have a clunky, old, monolith of a database powering one or two of our less user-friendly systems (this is controlled by a third-party, so I can't change that).

Going forward, I'm looking to push the necessary data into a new (better structured) database and implement a mechanism to keep the data in sync between the two.

One of the quirky traits of this old database is that, instead of just using null values, it uses empty strings and treats them as if they're null.

In the new database, I want to use nulls where there's no data (because I can't think of any good reasons not to).

My issue is when I'm pulling the data from the old database to the new one (using MERGE statements), I'm using a NULLIF([myCol], '') to check there's actually any data - and just treating it as a null if not.

For example, when syncing the data for students I'm planning on doing something along the lines of (the name of the databases and schemas isn't relevant to the question so I'm anonymizing the structure a bit):

USE [NewDB];

DROP TABLE IF EXISTS #myTempTable;

SELECT *
INTO #myTempTable
FROM [OldDB].[schemaName].[Students];

MERGE [schema].[Person] p
USING #myTempTable s 
      ON TRIM(s.STUD_ISN) = p.StudentDBID
--WHEN MATCHED and not equal, update
WHEN NOT MATCHED 
        THEN INSERT (
                        [Surname] ,
                        [PreferredSurname] ,
                        [FormerSurname] ,
                        [Forename] ,
                        [MiddleNames] , 
                        [PreferredForename] ,
                        [DoB] ,
                        [Gender] ,
                        [GenderIdentity] ,
                        [Title] ,
                        [Deceased] ,
                        [DeceasedDate] ,
                        [StudentDBID])
             VALUES ( 
                        NULLIF(TRIM([s].[STUD_Surname]), '') ,
                        NULLIF(TRIM([s].[STUD_Preferred_Surname]), '')  ,
                        NULLIF(TRIM([s].STUD_Former_Surname), '')  ,
                        NULLIF(TRIM([s].[STUD_Forename_1]), '')  ,
                        NULLIF(TRIM([s].[STUD_Forename_2]), '')  ,
                        NULLIF(TRIM([s].[STUD_Known_As]), '')  ,
                        [s].[STUD_DOB] ,
                        NULLIF(TRIM([s].[STUD_Gender]), '')  ,
                        NULLIF(TRIM([s].STUD_Gender_Identity), '')  ,
                        NULLIF(TRIM([s].STUD_Title), '')  ,
                        [s].STUD_Deceased ,
                        [s].STUD_Date_of_Death ,
                        TRIM([s].STUD_ISN)
                    );

Now, obviously this current query only deals with the following scenario:

The old database has data and the new one doesn't

Because I'm having some grief with the logic for the following scenario:

The new database has the corresponding record, but it needs updating.

For the text based fields there's 3 different scenarios where the "new" data would need updating:

  1. Neither value is null/empty, but they're not equal
  2. The record in the "old" database is null/empty and the "new" one isn't
  3. The record in the "new" database is null and the "old" one isn't

I was hoping to do something relatively simple like:

WHEN MATCHED AND ( 
                    (NULLIF(TRIM([s].[STUD_Surname]), '') IS NOT NULL
                        AND p.Surname IS NULL)
                    OR (NULLIF(TRIM([s].[STUD_Surname]), '') IS NULL
                        AND p.Surname IS NOT NULL)
                    OR (NULLIF(TRIM([s].[STUD_Surname]), '') != p.Surname)
                 ) -- do the same for the other columns
THEN UPDATE SET
                Surname = s.STUD_Surname

But, NULLIF(TRIM([s].[STUD_Surname]), '') IS NOT NULL is apparently not valid?

Is my best option to just switch out the NULLIF for an ISNULL in the WHEN MATCHED clause... Or am I missing something that'll make it more efficient?

Andrew Corrigan
  • 1,017
  • 6
  • 23
  • There's no way to make this efficient. You'll have to change the condition. Applying a function to a field means there's no way to use any indexes that cover that field. The indexes are constructed using the field values, while you try to search using a completely different value. That's why you should avoid using any functions on indexed fields in the WHERE clause – Panagiotis Kanavos Jan 14 '22 at 10:09
  • 4
    If you worry about leading or trailing whitespace *clean the data before storing it*. If the existing data is dirty, clean it before trying to use it any further. If you can't modify the source data, copy it into a staging table using an `INSERT ... SELECT` that trims the fields – Panagiotis Kanavos Jan 14 '22 at 10:11
  • Does the source data really contain whitespace? Not just an empty string but one or more whitespace characters? The expression `(STUD_Surname is null or STUD_Surname ='')` can use indexes because it's checking for the existence of specific values – Panagiotis Kanavos Jan 14 '22 at 10:16
  • Yeah, the source data is a tangled mess - although... why didn't I think of cleaning it in the temp table. Guess I need more coffee in my coffee – Andrew Corrigan Jan 14 '22 at 10:46
  • An varchar column containing any number of spaces will evaluate as TRUE when compared to an empty string - so there is no reason to trim the column. And quite frankly, why do you think you need to trim so much? Why do you need to insert rows into a temp table before the MERGE? And look more closely at `THEN UPDATE SET Surname = s.STUD_Surname`. You should see a problem there. – SMor Jan 14 '22 at 12:12

3 Answers3

0

The checks for the update can be simplified.

Since the source temp table doesn't have NULL's.
You just need to compare the trimmed values.

MERGE Person t
USING #tmpTable s 
   ON s.STUD_ISN = t.StudentDBID
WHEN NOT MATCHED THEN 
  INSERT (
    StudentDBID
  , Forename
  , PreferredForename
  , Surname
  )
  VALUES (
    NULLIF(TRIM(s.STUD_ISN), '')
  , NULLIF(TRIM(s.STUD_Forename), '')
  , NULLIF(TRIM(s.STUD_Forename_1), '')
  , NULLIF(TRIM(s.STUD_Surname), '')
  )
WHEN MATCHED
AND (
     COALESCE(t.Forename, '') != TRIM(s.STUD_Forename)
  OR COALESCE(t.PreferredForename, '') != TRIM(s.STUD_Forename_1)
  OR COALESCE(t.Surname, '') != TRIM(s.STUD_Surname)
) 
THEN 
UPDATE SET
  UpdatedOn = SYSDATETIME()
, Forename = NULLIF(TRIM(s.STUD_Forename), '')
, PreferredForename = NULLIF(TRIM(s.STUD_Forename_1), '')
, Surname = NULLIF(TRIM(s.STUD_Surname), '')
;
SELECT *
FROM Person
StudentDBID | Forename | PreferredForename | Surname | UpdatedOn                   | CreatedOn                  
:---------- | :------- | :---------------- | :------ | :-------------------------- | :--------------------------
B9701       | Bob      | Bobby             | Modest  | 2022-01-14 12:58:05.9862269 | 2022-01-14 12:58:05.9862269
J0402       | Jane     | JD                | Doe     | 2022-01-14 12:58:05.9862269 | 2022-01-14 12:58:05.9862269
J0504       | Jim      | Jimmy             | Savage  | null                        | 2022-01-14 12:58:05.9862269
J7201       | John     | null              | Doe     | null                        | 2022-01-14 12:58:05.9862269
M0803       | Mike     | null              | Hammer  | 2022-01-14 12:58:05.9862269 | 2022-01-14 12:58:05.9862269

Demo on db<>fiddle here

LukStorms
  • 28,916
  • 5
  • 31
  • 45
0

In the end, my solution was to apply the various functions within the SELECT .... Into .... block. Like so:

ALTER PROCEDURE [NGSync].[spFullStudentSync]
AS
BEGIN
    SET NOCOUNT ON;

    DROP TABLE IF EXISTS #students

    SELECT [STUD_ISN],
           NULLIF(TRIM([STUD_Student_ID]), '') AS [STUD_Student_ID] ,
           NULLIF(TRIM([STUD_Surname]), '') AS [STUD_Surname] ,
           NULLIF(TRIM([STUD_Forename_1]), '') AS [STUD_Forename_1] ,
           NULLIF(TRIM([STUD_Forename_2]), '') AS [STUD_Forename_2] ,
           NULLIF(TRIM([STUD_Known_As]), '') AS [STUD_Known_As] ,
           [STUD_DOB] ,
           NULLIF(TRIM([STUD_Gender]) , '') AS [STUD_Gender],
           NULLIF(TRIM([STUD_Title]) , '') AS [STUD_Title],
           NULLIF(TRIM([STUD_Ethnicity]) , '') AS [STUD_Ethnicity],
           NULLIF(TRIM([STUD_LDDHP]) , '') AS [STUD_LDDHP],
           NULLIF(TRIM([STUD_Home_Telephone_No] ) , '') AS [STUD_Home_Telephone_No] ,
           NULLIF(TRIM([STUD_Daytime_Telephone] ) , '') AS [STUD_Daytime_Telephone] ,
           NULLIF(TRIM([STUD_Mobile_Telephone] ) , '') AS [STUD_Mobile_Telephone] ,
           NULLIF(TRIM([STUD_EMail_Address] ) , '') AS [STUD_EMail_Address] ,
           NULLIF(TRIM([STUD_Former_Surname] ) , '') AS [STUD_Former_Surname] ,
           CAST( CASE WHEN NULLIF(TRIM([STUD_Deceased] ) , '') = 'D' THEN 1 else 0 end AS bit ) AS [STUD_Deceased] ,
           NULLIF(TRIM([STUD_Deletion_Flag] ) , '') AS [STUD_Deletion_Flag] ,
           [STUD_Delete_Merge]  ,
           NULLIF(TRIM([STUD_Photo_filename] ) , '') AS [STUD_Photo_filename] ,
           NULLIF(TRIM([STUD_Nationality] ) , '') AS [STUD_Nationality] ,
           [STUD_Date_of_Entry_in_UK] ,
           NULLIF(TRIM([STUD_Student_Type_FESR] ) , '') AS [STUD_Student_Type_FESR] ,
           [STUD_School_ISN] ,
           NULLIF(TRIM([STUD_Home_LEA] ) , '') AS [STUD_Home_LEA] ,
           NULLIF(TRIM([STUD_Employer_Code] ) , '') AS [STUD_Employer_Code] ,
           NULLIF(TRIM([STUD_Religion] ) , '') AS [STUD_Religion] ,
           NULLIF(TRIM([STUD_Location] ) , '') AS [STUD_Location] ,
           [STUD_TPS_Include] ,
           [STUD_QOE_Complete] ,
           NULLIF(TRIM([STUD_UCAS_Application_Code] ) , '') AS [STUD_UCAS_Application_Code] ,
           [STUD_MIAP_Consent_Status] ,
           [STUD_MIAP_Verification_Type] ,
           NULLIF(TRIM([STUD_MIAP_Other_Verification] ) , '') AS [STUD_MIAP_Other_Verification] ,
           NULLIF(TRIM([STUD_Bank_Sort_Code] ) , '') AS [STUD_Bank_Sort_Code] ,
           NULLIF(TRIM([STUD_Bank_AC_No] ) , '') AS [STUD_Bank_AC_No] ,
           [STUD_Bank_ISN] ,
           NULLIF(TRIM([STUD_Bank_Postcode] ) , '') AS [STUD_Bank_Postcode] ,
           NULLIF(TRIM([STUD_Bank_AC_Holders_Name] ) , '') AS [STUD_Bank_AC_Holders_Name] ,
           NULLIF(TRIM([STUD_Current_Tutor_Group] ) , '') AS [STUD_Current_Tutor_Group] ,
           NULLIF(TRIM([STUD_Current_PostCode] ) , '') AS [STUD_Current_PostCode] ,
           NULLIF(TRIM([STUD_Doctor_Name] ) , '') AS [STUD_Doctor_Name] ,
           NULLIF(TRIM([STUD_Doctor_Telephone] ) , '') AS [STUD_Doctor_Telephone] ,
           [STUD_ULN]  ,
           NULLIF(TRIM([STUD_College_Email_Address] ) , '') AS [STUD_College_Email_Address] ,
           [STUD_Date_Due_to_Leave_UK] ,
           [STUD_UK_Residence] ,
           NULLIF(TRIM([STUD_Parish] ) , '') AS [STUD_Parish] ,
           NULLIF(TRIM([STUD_Area] ) , '') AS [STUD_Area] ,
           [STUD_Bus_Pass] ,
           NULLIF(TRIM([STUD_Bus_Route] ) , '') AS [STUD_Bus_Route] ,
           [STUD_Eng_1st_Lang] ,
           NULLIF(TRIM([STUD_Language] ) , '') AS [STUD_Language] ,
           [STUD_High_Achiever] ,
           [STUD_Is_Staff] ,
           [STUD_Is_Staff_ISN] ,
           [STUD_Excluded] ,
           [STUD_RUI_3_no_contact] ,
           [STUD_RUI_1_courses] ,
           [STUD_RUI_2_surveys] ,
           [STUD_PMC_C1_post] ,
           [STUD_PMC_C2_phone] ,
           [STUD_PMC_C3_email] ,
           [STUD_Created_Date] ,
           NULLIF(TRIM([STUD_Created_User] ) , '') AS [STUD_Created_User] ,
           NULLIF(TRIM([STUD_Created_Prog] ) , '') AS [STUD_Created_Prog] ,
           [STUD_Modified_Date] ,
           NULLIF(TRIM([STUD_Modified_User] ) , '') AS [STUD_Modified_User] ,
           NULLIF(TRIM([STUD_Modified_Prog] ) , '') AS [STUD_Modified_Prog] ,
           [STUD_OK_to_use_Image] ,
           NULLIF(TRIM([STUD_Sexual_Orientation] ) , '') AS [STUD_Sexual_Orientation] ,
           NULLIF(TRIM([STUD_Gender_Identity] ) , '') AS [STUD_Gender_Identity] ,
           [STUD_Visa_Proof_Produced] ,
           [STUD_Visa_Proof_Produced_Date] ,
           NULLIF(TRIM([STUD_Visa_Proof_Produced_Details] ) , '') AS [STUD_Visa_Proof_Produced_Details] ,
           NULLIF(TRIM([STUD_Visa_Type] ) , '') AS [STUD_Visa_Type] ,
           [STUD_Visa_Expiry_Date] ,
           [STUD_Visa_Letter_Issued] ,
           [STUD_Asylum_Seeker] ,
           [STUD_Refugee] ,
           [STUD_Entered_UK_for_Education] ,
           [STUD_Restrictions_on_Stay] ,
           NULLIF(TRIM([STUD_Add_To_Portal_Title_Bar] ) , '') AS [STUD_Add_To_Portal_Title_Bar] ,
           [STUD_RUI_5] ,
           NULLIF(TRIM([STUD_Visa_Reference]) , '') AS  [STUD_Visa_Reference],
           NULLIF(TRIM([STUD_Visa_Note] ) , '') AS [STUD_Visa_Note] ,
           [STUD_FA_Bank_ISN] ,
           NULLIF(TRIM([STUD_FA_Bank_Sort_Code] ) , '') AS [STUD_FA_Bank_Sort_Code] ,
           NULLIF(TRIM([STUD_FA_Bank_AC_No] ) , '') AS [STUD_FA_Bank_AC_No] ,
           NULLIF(TRIM([STUD_FA_Bank_AC_Holders_Name]) , '') AS  [STUD_FA_Bank_AC_Holders_Name],
           NULLIF(TRIM([STUD_Marital_Status] ) , '') AS [STUD_Marital_Status] ,
           NULLIF(TRIM([STUD_Country_of_Birth] ) , '') AS [STUD_Country_of_Birth] ,
           NULLIF(TRIM([STUD_On_Supervision] ) , '') AS [STUD_On_Supervision] ,
           [STUD_S_IraqDEAS] ,
           [STUD_S_Pending_SCN] ,
           NULLIF(TRIM([STUD_Primary_LLDDCode] ) , '') AS [STUD_Primary_LLDDCode] ,
           NULLIF(TRIM([STUD_A2C_Language] ) , '') AS [STUD_A2C_Language] ,
           NULLIF(TRIM([STUD_Signature_Filename] ) , '') AS [STUD_Signature_Filename] ,
           [STUD_S_ILA_Expiry_Date] ,
           [STUD_Date_Of_Death] ,
           [STUD_S_SQA_Exception] ,
           NULLIF(TRIM([STUD_S_SQA_VID] ) , '') AS [STUD_S_SQA_VID] ,
           [STUD_S_SQA_Registration_Date] ,
           NULLIF(TRIM([STUD_FA_Bank_AC_Ref] ) , '') AS [STUD_FA_Bank_AC_Ref] ,
           NULLIF(TRIM([STUD_Bank_AC_Ref] ) , '') AS [STUD_Bank_AC_Ref] ,
           [STUD_Anonymised] ,
           NULLIF(TRIM([STUD_Quarantine_Status] ) , '') AS [STUD_Quarantine_Status] ,
           [STUD_KeycloakID] ,
           NULLIF(TRIM([STUD_Preferred_Surname] ) , '') AS [STUD_Preferred_Surname] ,
           [STUD_Exclude_From_Balancing] ,
           NULLIF(TRIM([STUD_Corresp_Preference] ) , '') AS [STUD_Corresp_Preference] ,
           NULLIF(TRIM([STUD_HESA_ID] ) , '') AS [STUD_HESA_ID] ,
           NULLIF(TRIM([STUD_FEPUS_ID] ) , '') AS [STUD_FEPUS_ID] ,
           NULLIF(TRIM([STUD_PEV_PIN] ) , '') AS [STUD_PEV_PIN] ,
           NULLIF(TRIM([STUD_Photo_GUID_Filename] ) , '') AS [STUD_Photo_GUID_Filename]
    INTO #students
    FROM [NG].[dbo].[STUDstudent];

    MERGE [people].[Person] AS [p]
    USING #students AS [s]
        ON [s].[STUD_ISN] = [p].[StudentDBID]
    WHEN MATCHED
            AND (
                    [s].[STUD_Surname] != [p].[Surname]
                        OR [s].[STUD_Preferred_Surname] != [p].[PreferredSurname]
                        OR [s].[STUD_Former_Surname] != [p].[FormerSurname]
                        OR [s].[STUD_Forename_1] != [p].[Forename]
                        OR [s].[STUD_Forename_2] != [p].[MiddleNames]
                        OR [s].[STUD_Known_As] != [p].[PreferredForename]
                        OR [s].[STUD_DoB] != [p].[DoB]
                        OR [s].[STUD_Gender] != [p].[Gender]
                        OR [s].[STUD_Gender_Identity] != [p].[GenderIdentity]
                        OR [s].[STUD_Title] != [p].[Title]
                        OR [s].[STUD_Deceased] != [p].[Deceased]
                        OR [s].[STUD_Date_of_Death] != [p].[DeceasedDate]
                        OR [s].[STUD_Is_Staff_ISN] != [p].[StaffDBID]
                )
       THEN UPDATE SET
                            [Surname] = [s].[STUD_Surname] ,
                            [PreferredSurname] = [s].[STUD_Preferred_Surname] ,
                            [FormerSurname] = [s].[STUD_Former_Surname],
                            [Forename] = [s].[STUD_Forename_1] ,
                            [MiddleNames] = [s].[STUD_Forename_2] ,
                            [PreferredForename] = [s].[STUD_Known_As],
                            [DoB] = [s].[STUD_DoB] ,
                            [Gender] = [s].[STUD_Gender] ,
                            [GenderIdentity] = [s].[STUD_Gender_Identity] ,
                            [Title] = [s].[STUD_Title] ,
                            [Deceased] = [s].[STUD_Deceased],
                            [StaffDBID] = [s].[STUD_Is_Staff_ISN]
    WHEN NOT MATCHED
        THEN INSERT (
                        [Surname] ,
                        [PreferredSurname] ,
                        [FormerSurname] ,
                        [Forename] ,
                        [MiddleNames] ,
                        [PreferredForename] ,
                        [DoB] ,
                        [Gender] ,
                        [GenderIdentity] ,
                        [Title] ,
                        [Deceased] ,
                        [DeceasedDate] ,
                        [StudentDBID] ,
                        [StaffDBID])
             VALUES (
                        [s].[STUD_Surname] ,
                        [s].[STUD_Preferred_Surname] ,
                        [s].[STUD_Former_Surname] ,
                        [s].[STUD_Forename_1] ,
                        [s].[STUD_Forename_2] ,
                        [s].[STUD_Known_As]  ,
                        [s].[STUD_DOB] ,
                        [s].[STUD_Gender] ,
                        [s].[STUD_Gender_Identity]  ,
                        [s].[STUD_Title] ,
                        [s].[STUD_Deceased] ,
                        [s].[STUD_Date_of_Death] ,
                        [s].[STUD_ISN] ,
                        [s].[STuD_IS_Staff_ISN]
                    );

END

It could probably be made more efficient but, considering it's going to part of a nightly task and run when people aren't using the system, it handles ~35000 records in ~2 seconds so it's "efficient enough".

Andrew Corrigan
  • 1,017
  • 6
  • 23
-2

You can use the Coalesce Function available in SQL to achieve the desired result

It will return the first Non-Null value from the passed parameters

SELECT COALESCE( NULL ,'First Non Null','Alex')

COALESCE Keyword