1

I am an experienced Azure user when it comes to data infrastructure and pipelines. However, I am new to processing XML data and my particular case seems somewhat complex. After exhausting my resources I am turning to the community for help.

Please note all data shown is demo data. There is no PII being leaked

The XML data is present in my blob (ADLS2). I am using Data Factory V2 copy activity to copy the data into my Azure SQL database. All permissions are appropriately configured.

Notice in the XML inside the 'Applicant' section you can see 'Type' of 'primary' which is always present. IF there is a coBorrower then there will be a second 'Applicant' portion with Type 'coBorrower'.

Portion of XML under question:

<LoanExport>
   <Application>
      <GeneratedDate>06/15/2023</GeneratedDate>
      <BusinessApplication/>
      <ApplicationNumber>132</ApplicationNumber>
      <ApplicationType>Joint</ApplicationType>
      <Business>false</Business>
      <Applicants>
         <Applicant>
            <Type>primary</Type>
            <FirstName>Robert</FirstName>
            <LastName>Ice</LastName>
            <MiddleName/>
            <NameSuffix/>
            <DateOfBirth>02/22/1992</DateOfBirth>
            <CreditScore/>
            <SocialSecurityNumber>666423221</SocialSecurityNumber>
            <IsUSCitizen>true</IsUSCitizen>
            <PhoneNumber>5555555555</PhoneNumber>
            <WorkPhoneNumber/>
            <CellularPhoneNumber>3334445555</CellularPhoneNumber>
            <EmailAddress>demomm91zft@tcicredit.com</EmailAddress>
            <DriversLicense/>
            <DriversLicenseState/>
            <DriversLicenseExpirationDate/>
            <DriversLicenseIssueDate/>
            <MaritalStatus>false</MaritalStatus>
            <Addresses>
               <Address>
                  <Type>Primary</Type>
                  <Address>
                     <Number>1</Number>
                     <StreetPrefix/>
                     <StreetName>Hawk Drive</StreetName>
                     <StreetType/>
                     <StreetSuffix/>
                     <ApartmentOrSuiteNumber/>
                     <PostalCode>60750</PostalCode>
                     <City>FANTASY ISLAND</City>
                     <State>MD</State>
                  </Address>
                  <TimeAtResidenceYears>5</TimeAtResidenceYears>
                  <TimeAtResidenceMonths>7</TimeAtResidenceMonths>
                  <Ownership>Own</Ownership>
                  <MonthlyPayment currency="USD">550.00</MonthlyPayment>
               </Address>
            </Addresses>
            <Employments>
               <Employment>
                  <isCurrentEmployment>true</isCurrentEmployment>
                  <Status>Employed</Status>
                  <EmployerName>TCI</EmployerName>
                  <ContactName/>
                  <ContactPhone>1111111111</ContactPhone>
                  <JobPosition>Business Analyst</JobPosition>
                  <Address>
                     <StreetName/>
                     <PostalCode/>
                     <City/>
                     <State/>
                  </Address>
                  <TimeAtEmployerYears>10</TimeAtEmployerYears>
                  <TimeAtEmployerMonths>6</TimeAtEmployerMonths>
                  <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
               </Employment>
            </Employments>
            <DenialReasons>
               <DenialReason>
                  <Reason>Applicant has no SS number</Reason>
               </DenialReason>
            </DenialReasons>
         </Applicant>
         <Applicant>
            <Type>coBorrower</Type>
            <FirstName>asd</FirstName>
            <LastName>asd</LastName>
            <MiddleName>asd</MiddleName>
            <NameSuffix>IV</NameSuffix>
            <DateOfBirth>07/17/1973</DateOfBirth>
            <CreditScore/>
            <SocialSecurityNumber>333333333</SocialSecurityNumber>
            <IsUSCitizen>true</IsUSCitizen>
            <PhoneNumber>1231231231</PhoneNumber>
            <WorkPhoneNumber/>
            <CellularPhoneNumber>1231231231</CellularPhoneNumber>
            <EmailAddress>dddd@yahoo.com</EmailAddress>
            <DriversLicense/>
            <DriversLicenseState/>
            <DriversLicenseExpirationDate/>
            <DriversLicenseIssueDate/>
            <MaritalStatus>false</MaritalStatus>
            <Addresses>
               <Address>
                  <Type>Primary</Type>
                  <Address>
                     <Number/>
                     <StreetPrefix/>
                     <StreetName>asdasd</StreetName>
                     <StreetType/>
                     <StreetSuffix/>
                     <ApartmentOrSuiteNumber/>
                     <PostalCode>12345</PostalCode>
                     <City>Schenectady</City>
                     <State>NY</State>
                  </Address>
                  <TimeAtResidenceYears>17</TimeAtResidenceYears>
                  <TimeAtResidenceMonths>11</TimeAtResidenceMonths>
                  <Ownership>Living with relatives</Ownership>
                  <MonthlyPayment currency="USD">2222.00</MonthlyPayment>
               </Address>
            </Addresses>
            <Employments>
               <Employment>
                  <isCurrentEmployment>true</isCurrentEmployment>
                  <Status>Retired</Status>
                  <EmployerName/>
                  <ContactName/>
                  <ContactPhone/>
                  <JobPosition/>
                  <Address>
                     <StreetName/>
                     <PostalCode/>
                     <City/>
                     <State/>
                  </Address>
                  <TimeAtEmployerYears>0</TimeAtEmployerYears>
                  <TimeAtEmployerMonths>0</TimeAtEmployerMonths>
                  <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
               </Employment>
            </Employments>
            <DenialReasons>
               <DenialReason>
                  <Reason>Applicant has no SS number</Reason>
               </DenialReason>
            </DenialReasons>
         </Applicant>
      </Applicants>

I am using a small SQL table for testing being able to pull both the primary and coborrower data correctly.

CREATE TABLE [bronze].[loan_origination_application](
    [application_number] [nvarchar](max) NULL,
    [generated_date] [nvarchar](max) NULL,
    [applicant_primary_first_name] [nvarchar](max) NULL,
    [applicant_coborrower_first_name] [nvarchar](max) NULL
)

Here is the portion of my Azure Data Factory pipeline JSON:

                    },
                    "sink": {
                        "type": "AzureSqlSink",
                        "writeBehavior": "insert",
                        "sqlWriterUseTableLock": false,
                        "disableMetricsCollection": false
                    },
                    "enableStaging": false,
                    "translator": {
                        "type": "TabularTranslator",
                        "mappings": [
                            {
                                "source": {
                                    "path": "['Application']['GeneratedDate']"
                                },
                                "sink": {
                                    "name": "generated_date",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "['Application']['ApplicationNumber']"
                                },
                                "sink": {
                                    "name": "application_number",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "['Application']['Applicants']['Applicant']['FirstName']"
                                },
                                "sink": {
                                    "name": "applicant_primary_first_name",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "['Application']['Applicants']['Applicant'][1]['FirstName']"
                                },
                                "sink": {
                                    "name": "applicant_coborrower_first_name",
                                    "type": "String"
                                }
                            }
                        ],
                        "collectionReference": "$['applications']['LoanExport']",
                        "mapComplexValuesToString": true
                    }

Note in all my databse output the 4 columns are respectively: 'application_number', 'generated_date', 'applicant_primary_first_name', 'applicant_coborrower_first_name'
This currently results in the following write in the databse:

132 06/15/2023  NULL    asd
134 06/15/2023  Robert  NULL
135 06/15/2023  Robert  NULL
136 06/15/2023  Robert  NULL

Note that in this case if there is a coborrower applicant (application 132) it will pull the correct name but will ignore the primary applicant. When there is no coborrower applicant then the primary applicant is pulled in accurately.

I have tried the following as well as other similar approaches:

Data Factory JSON:

                            {
                                "source": {
                                    "path": "['Application']['Applicants']['Applicant'][0]['FirstName']"
                                },
                                "sink": {
                                    "name": "applicant_primary_first_name",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "['Application']['Applicants']['Applicant'][1]['FirstName']"
                                },
                                "sink": {
                                    "name": "applicant_coborrower_first_name",
                                    "type": "String"
                                }
                            }

Database output:

132 06/15/2023  Robert  asd
134 06/15/2023  NULL    NULL
135 06/15/2023  NULL    NULL
136 06/15/2023  NULL    NULL

In this case you can see the primary and coborrower applicant types are both pulled correctly but only when there is a coborrower not when there is only a primary applicant.

I also tried some Chat GPT suggestions involving filtering which did not help although I feel it is on the right track.

                            {
                                "source": {
                                    "path": "['Application']['Applicants']['Applicant'][?(@['Type']=='primary')]['FirstName']"
                                },
                                "sink": {
                                    "name": "applicant_primary_first_name",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "['Application']['Applicants']['Applicant'][?(@['Type']=='coBorrower')]['FirstName']"
                                },
                                "sink": {
                                    "name": "applicant_coborrower_first_name",
                                    "type": "String"
                                }
                            }

Database output has the same effect as above. Applicant of type primary is only pulled when applicant of type coborrower is also present.

132 06/15/2023  Robert  asd
134 06/15/2023  NULL    NULL
135 06/15/2023  NULL    NULL
136 06/15/2023  NULL    NULL

For clarity I will provide the XML for an application with only the primary type applicant.

<LoanExport>
   <Application>
      <GeneratedDate>06/15/2023</GeneratedDate>
      <BusinessApplication/>
      <ApplicationNumber>134</ApplicationNumber>
      <ApplicationType>Individual</ApplicationType>
      <Business>false</Business>
      <Applicants>
         <Applicant>
            <Type>primary</Type>
            <FirstName>Robert</FirstName>
            <LastName>Ice</LastName>
            <MiddleName/>
            <NameSuffix/>
            <DateOfBirth>02/22/1992</DateOfBirth>
            <CreditScore>0</CreditScore>
            <SocialSecurityNumber>666423221</SocialSecurityNumber>
            <IsUSCitizen>true</IsUSCitizen>
            <PhoneNumber>5555555555</PhoneNumber>
            <WorkPhoneNumber/>
            <CellularPhoneNumber>3334445555</CellularPhoneNumber>
            <EmailAddress>demo4quq2vs@tcicredit.com</EmailAddress>
            <DriversLicense/>
            <DriversLicenseState/>
            <DriversLicenseExpirationDate/>
            <DriversLicenseIssueDate/>
            <MaritalStatus>false</MaritalStatus>
            <Addresses>
               <Address>
                  <Type>Primary</Type>
                  <Address>
                     <Number>1</Number>
                     <StreetPrefix/>
                     <StreetName>Hawk Drive</StreetName>
                     <StreetType/>
                     <StreetSuffix/>
                     <ApartmentOrSuiteNumber/>
                     <PostalCode>60750</PostalCode>
                     <City>FANTASY ISLAND</City>
                     <State>MD</State>
                  </Address>
                  <TimeAtResidenceYears>5</TimeAtResidenceYears>
                  <TimeAtResidenceMonths>7</TimeAtResidenceMonths>
                  <Ownership>Own</Ownership>
                  <MonthlyPayment currency="USD">550.00</MonthlyPayment>
               </Address>
            </Addresses>
            <Employments>
               <Employment>
                  <isCurrentEmployment>true</isCurrentEmployment>
                  <Status>Employed</Status>
                  <EmployerName>TCI</EmployerName>
                  <ContactName/>
                  <ContactPhone>1111111111</ContactPhone>
                  <JobPosition>Business Analyst</JobPosition>
                  <Address>
                     <StreetName/>
                     <PostalCode/>
                     <City/>
                     <State/>
                  </Address>
                  <TimeAtEmployerYears>10</TimeAtEmployerYears>
                  <TimeAtEmployerMonths>6</TimeAtEmployerMonths>
                  <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
               </Employment>
            </Employments>
            <Stipulations>
               <Stipulation>
                  <Description>VOI Applicant</Description>
                  <Status>Active</Status>
               </Stipulation>
            </Stipulations>
         </Applicant>
      </Applicants>
  • I will add that in a moment of clarity I am realizing I can run 2 separate copy activities. 1 for primary applicants and 2 for coborrower applicants. I could then merge based on the application_number later. I would like to avoid this however because each XML LoanExport record is >5000 lines and there are a few dozen occurrences of this architecture that would quickly make that tedious. – TylerSimpson Jun 22 '23 at 19:17
  • Azure SQL DB is pretty capable with json and I’ve answered on this pattern a few times: land the xml in a table then use T-SQL to process it, eg similar to here: https://stackoverflow.com/a/66528307/1527504 – wBob Jun 26 '23 at 15:16
  • @wBob Thank you for your response. I looked at this approach early on, but I encountered difficulties getting the entire XML to a variable in my SQL DB. I followed a tutorial that used sql server table types but ultimately encountered issues with limitation on the ADF lookup activity (5000 rows) as well as variable data types. For example, it seemed my only option was to convert to text then back to XML which never quite worked right. It may be worth me looking back into looping to get around the 5000 row limitation. Do you have other suggestions? – TylerSimpson Jun 26 '23 at 19:34
  • Yes the technique involves OPENROWSET, to get the xml into a variable or table and then use the xml data type methods like node, value, query etc. Similar to this: https://stackoverflow.com/a/43707437/1527504 – wBob Jun 26 '23 at 20:35
  • I’m AFK this week but can do up an answer closer to your xml next week. I think you’ve got what you need though, forget about ADF and xml, it’s awful! – wBob Jun 26 '23 at 20:36
  • 1
    @wBob If you have the free time next week to drop an example, I would greatly appreciate it. Please don't feel obliged I already appreciate your help. XML in ADF truly is awful! I currently have a solution in place using 3 separate copy activities where I specify no applicant index, applicant index [0] and applicant index [1] and merge at silver schema later. This way I can get the individual primary applicant, joint primary applicant, and joint coborrower applicant respectively. – TylerSimpson Jun 27 '23 at 12:07

2 Answers2

0
  • I have achieved the requirement using azure dataflows. The following is the xml file data that I used:
<LoanExport>
   <Application>
      <GeneratedDate>06/15/2023</GeneratedDate>
      <BusinessApplication/>
      <ApplicationNumber>132</ApplicationNumber>
      <ApplicationType>Joint</ApplicationType>
      <Business>false</Business>
      <Applicants>
         <Applicant>
            <Type>primary</Type>
            <FirstName>Robert</FirstName>
            <LastName>Ice</LastName>
            <MiddleName/>
            <NameSuffix/>
            <DateOfBirth>02/22/1992</DateOfBirth>
            <CreditScore/>
            <SocialSecurityNumber>666423221</SocialSecurityNumber>
            <IsUSCitizen>true</IsUSCitizen>
            <PhoneNumber>5555555555</PhoneNumber>
            <WorkPhoneNumber/>
            <CellularPhoneNumber>3334445555</CellularPhoneNumber>
            <EmailAddress>demomm91zft@tcicredit.com</EmailAddress>
            <DriversLicense/>
            <DriversLicenseState/>
            <DriversLicenseExpirationDate/>
            <DriversLicenseIssueDate/>
            <MaritalStatus>false</MaritalStatus>
            <Addresses>
               <Address>
                  <Type>Primary</Type>
                  <Address>
                     <Number>1</Number>
                     <StreetPrefix/>
                     <StreetName>Hawk Drive</StreetName>
                     <StreetType/>
                     <StreetSuffix/>
                     <ApartmentOrSuiteNumber/>
                     <PostalCode>60750</PostalCode>
                     <City>FANTASY ISLAND</City>
                     <State>MD</State>
                  </Address>
                  <TimeAtResidenceYears>5</TimeAtResidenceYears>
                  <TimeAtResidenceMonths>7</TimeAtResidenceMonths>
                  <Ownership>Own</Ownership>
                  <MonthlyPayment currency="USD">550.00</MonthlyPayment>
               </Address>
            </Addresses>
            <Employments>
               <Employment>
                  <isCurrentEmployment>true</isCurrentEmployment>
                  <Status>Employed</Status>
                  <EmployerName>TCI</EmployerName>
                  <ContactName/>
                  <ContactPhone>1111111111</ContactPhone>
                  <JobPosition>Business Analyst</JobPosition>
                  <Address>
                     <StreetName/>
                     <PostalCode/>
                     <City/>
                     <State/>
                  </Address>
                  <TimeAtEmployerYears>10</TimeAtEmployerYears>
                  <TimeAtEmployerMonths>6</TimeAtEmployerMonths>
                  <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
               </Employment>
            </Employments>
            <DenialReasons>
               <DenialReason>
                  <Reason>Applicant has no SS number</Reason>
               </DenialReason>
            </DenialReasons>
         </Applicant>
         <Applicant>
            <Type>coBorrower</Type>
            <FirstName>asd</FirstName>
            <LastName>asd</LastName>
            <MiddleName>asd</MiddleName>
            <NameSuffix>IV</NameSuffix>
            <DateOfBirth>07/17/1973</DateOfBirth>
            <CreditScore/>
            <SocialSecurityNumber>333333333</SocialSecurityNumber>
            <IsUSCitizen>true</IsUSCitizen>
            <PhoneNumber>1231231231</PhoneNumber>
            <WorkPhoneNumber/>
            <CellularPhoneNumber>1231231231</CellularPhoneNumber>
            <EmailAddress>dddd@yahoo.com</EmailAddress>
            <DriversLicense/>
            <DriversLicenseState/>
            <DriversLicenseExpirationDate/>
            <DriversLicenseIssueDate/>
            <MaritalStatus>false</MaritalStatus>
            <Addresses>
               <Address>
                  <Type>Primary</Type>
                  <Address>
                     <Number/>
                     <StreetPrefix/>
                     <StreetName>asdasd</StreetName>
                     <StreetType/>
                     <StreetSuffix/>
                     <ApartmentOrSuiteNumber/>
                     <PostalCode>12345</PostalCode>
                     <City>Schenectady</City>
                     <State>NY</State>
                  </Address>
                  <TimeAtResidenceYears>17</TimeAtResidenceYears>
                  <TimeAtResidenceMonths>11</TimeAtResidenceMonths>
                  <Ownership>Living with relatives</Ownership>
                  <MonthlyPayment currency="USD">2222.00</MonthlyPayment>
               </Address>
            </Addresses>
            <Employments>
               <Employment>
                  <isCurrentEmployment>true</isCurrentEmployment>
                  <Status>Retired</Status>
                  <EmployerName/>
                  <ContactName/>
                  <ContactPhone/>
                  <JobPosition/>
                  <Address>
                     <StreetName/>
                     <PostalCode/>
                     <City/>
                     <State/>
                  </Address>
                  <TimeAtEmployerYears>0</TimeAtEmployerYears>
                  <TimeAtEmployerMonths>0</TimeAtEmployerMonths>
                  <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
               </Employment>
            </Employments>
            <DenialReasons>
               <DenialReason>
                  <Reason>Applicant has no SS number</Reason>
               </DenialReason>
            </DenialReasons>
         </Applicant>
      </Applicants>
    </Application>
    <Application>
      <GeneratedDate>06/15/2023</GeneratedDate>
      <BusinessApplication/>
      <ApplicationNumber>134</ApplicationNumber>
      <ApplicationType>Individual</ApplicationType>
      <Business>false</Business>
      <Applicants>
         <Applicant>
            <Type>primary</Type>
            <FirstName>Robert</FirstName>
            <LastName>Ice</LastName>
            <MiddleName/>
            <NameSuffix/>
            <DateOfBirth>02/22/1992</DateOfBirth>
            <CreditScore>0</CreditScore>
            <SocialSecurityNumber>666423221</SocialSecurityNumber>
            <IsUSCitizen>true</IsUSCitizen>
            <PhoneNumber>5555555555</PhoneNumber>
            <WorkPhoneNumber/>
            <CellularPhoneNumber>3334445555</CellularPhoneNumber>
            <EmailAddress>demo4quq2vs@tcicredit.com</EmailAddress>
            <DriversLicense/>
            <DriversLicenseState/>
            <DriversLicenseExpirationDate/>
            <DriversLicenseIssueDate/>
            <MaritalStatus>false</MaritalStatus>
            <Addresses>
               <Address>
                  <Type>Primary</Type>
                  <Address>
                     <Number>1</Number>
                     <StreetPrefix/>
                     <StreetName>Hawk Drive</StreetName>
                     <StreetType/>
                     <StreetSuffix/>
                     <ApartmentOrSuiteNumber/>
                     <PostalCode>60750</PostalCode>
                     <City>FANTASY ISLAND</City>
                     <State>MD</State>
                  </Address>
                  <TimeAtResidenceYears>5</TimeAtResidenceYears>
                  <TimeAtResidenceMonths>7</TimeAtResidenceMonths>
                  <Ownership>Own</Ownership>
                  <MonthlyPayment currency="USD">550.00</MonthlyPayment>
               </Address>
            </Addresses>
            <Employments>
               <Employment>
                  <isCurrentEmployment>true</isCurrentEmployment>
                  <Status>Employed</Status>
                  <EmployerName>TCI</EmployerName>
                  <ContactName/>
                  <ContactPhone>1111111111</ContactPhone>
                  <JobPosition>Business Analyst</JobPosition>
                  <Address>
                     <StreetName/>
                     <PostalCode/>
                     <City/>
                     <State/>
                  </Address>
                  <TimeAtEmployerYears>10</TimeAtEmployerYears>
                  <TimeAtEmployerMonths>6</TimeAtEmployerMonths>
                  <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
               </Employment>
            </Employments>
            <Stipulations>
               <Stipulation>
                  <Description>VOI Applicant</Description>
                  <Status>Active</Status>
               </Stipulation>
            </Stipulations>
         </Applicant>
      </Applicants>
    </Application>
</LoanExport>
  • There are multiple operations used like conditional split (on application type) and then operations to get the data in required format.
  • Once we extracted the required values, I have used the union by name to get the entire data as required. The following is the entire dataflow JSON that I used:
{
    "name": "dataflow1",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "Xml1",
                        "type": "DatasetReference"
                    },
                    "name": "source1"
                }
            ],
            "sinks": [
                {
                    "name": "sink1"
                }
            ],
            "transformations": [
                {
                    "name": "flatten1"
                },
                {
                    "name": "split1"
                },
                {
                    "name": "flatten2"
                },
                {
                    "name": "derivedColumn1"
                },
                {
                    "name": "select1"
                },
                {
                    "name": "select2"
                },
                {
                    "name": "derivedColumn2"
                },
                {
                    "name": "aggregate1"
                },
                {
                    "name": "derivedColumn3"
                },
                {
                    "name": "select3"
                },
                {
                    "name": "union1"
                }
            ],
            "scriptLines": [
                "source(output(",
                "          LoanExport as (Application as (Applicants as (Applicant as (Addresses as (Address as (Address as (ApartmentOrSuiteNumber as string, City as string, Number as boolean, PostalCode as integer, State as string, StreetName as string, StreetPrefix as string, StreetSuffix as string, StreetType as string), MonthlyPayment as ({@currency} as string, {_value_} as double), Ownership as string, TimeAtResidenceMonths as short, TimeAtResidenceYears as short, Type as string)), CellularPhoneNumber as long, CreditScore as boolean, DateOfBirth as date, DenialReasons as (DenialReason as (Reason as string)), DriversLicense as string, DriversLicenseExpirationDate as string, DriversLicenseIssueDate as string, DriversLicenseState as string, EmailAddress as string, Employments as (Employment as (Address as (City as string, PostalCode as string, State as string, StreetName as string), ContactName as string, ContactPhone as integer, EmployerName as string, GrossMonthlyIncome as ({@currency} as string, {_value_} as double), JobPosition as string, Status as string, TimeAtEmployerMonths as short, TimeAtEmployerYears as short, isCurrentEmployment as boolean)), FirstName as string, IsUSCitizen as boolean, LastName as string, MaritalStatus as boolean, MiddleName as string, NameSuffix as string, PhoneNumber as long, SocialSecurityNumber as integer, Stipulations as (Stipulation as (Description as string, Status as string)), Type as string, WorkPhoneNumber as string)[]), ApplicationNumber as short, ApplicationType as string, Business as boolean, BusinessApplication as string, GeneratedDate as date)[])",
                "     ),",
                "     allowSchemaDrift: true,",
                "     validateSchema: false,",
                "     ignoreNoFilesFound: false,",
                "     validationMode: 'none',",
                "     namespaces: true,",
                "     dateFormats: ['MM/dd/yyyy']) ~> source1",
                "source1 foldDown(unroll(LoanExport.Application, LoanExport.Application),",
                "     mapColumn(",
                "          Applicants = LoanExport.Application.Applicants,",
                "          ApplicationNumber = LoanExport.Application.ApplicationNumber,",
                "          ApplicationType = LoanExport.Application.ApplicationType,",
                "          Business = LoanExport.Application.Business,",
                "          BusinessApplication = LoanExport.Application.BusinessApplication,",
                "          GeneratedDate = LoanExport.Application.GeneratedDate",
                "     ),",
                "     skipDuplicateMapInputs: false,",
                "     skipDuplicateMapOutputs: false) ~> flatten1",
                "flatten1 split(ApplicationType=='Joint',",
                "     disjoint: false) ~> split1@(JointApplication, individualApplication)",
                "split1@JointApplication foldDown(unroll(Applicants.Applicant),",
                "     mapColumn(",
                "          Applicant = Applicants.Applicant,",
                "          ApplicationNumber,",
                "          ApplicationType,",
                "          GeneratedDate",
                "     ),",
                "     skipDuplicateMapInputs: false,",
                "     skipDuplicateMapOutputs: false) ~> flatten2",
                "flatten2 derive(first_name = Applicant.FirstName,",
                "          Type = Applicant.Type) ~> derivedColumn1",
                "derivedColumn1 select(mapColumn(",
                "          ApplicationNumber,",
                "          GeneratedDate,",
                "          first_name,",
                "          Type",
                "     ),",
                "     skipDuplicateMapInputs: true,",
                "     skipDuplicateMapOutputs: true) ~> select1",
                "split1@individualApplication select(mapColumn(",
                "          applicant_primary_first_name = Applicants.Applicant[1].FirstName,",
                "          ApplicationNumber,",
                "          GeneratedDate",
                "     ),",
                "     skipDuplicateMapInputs: true,",
                "     skipDuplicateMapOutputs: true) ~> select2",
                "select2 derive(applicant_coborrower_first_name = toString(null())) ~> derivedColumn2",
                "select1 aggregate(groupBy(ApplicationNumber,",
                "          GeneratedDate),",
                "     first_name = collect(first_name),",
                "          Type = collect(Type)) ~> aggregate1",
                "aggregate1 derive(applicant_primary_first_name = keyValues(Type,first_name)['primary'],",
                "          applicant_coborrower_first_name = keyValues(Type,first_name)['coBorrower']) ~> derivedColumn3",
                "derivedColumn3 select(mapColumn(",
                "          ApplicationNumber,",
                "          GeneratedDate,",
                "          applicant_primary_first_name,",
                "          applicant_coborrower_first_name",
                "     ),",
                "     skipDuplicateMapInputs: true,",
                "     skipDuplicateMapOutputs: true) ~> select3",
                "select3, derivedColumn2 union(byName: true)~> union1",
                "union1 sink(validateSchema: false,",
                "     skipDuplicateMapInputs: true,",
                "     skipDuplicateMapOutputs: true,",
                "     store: 'cache',",
                "     format: 'inline',",
                "     output: false,",
                "     saveOrder: 1) ~> sink1"
            ]
        }
    }
}
  • I have used cache sink which you will have to change to azure SQL database. The following is the final output that I have achieved.

enter image description here

Saideep Arikontham
  • 5,558
  • 2
  • 3
  • 11
  • Saideep, thank you for your response. This approach appears to solve my problem. I am going to hold off on marking this as the solution for a few days as I want to avoid data flows if possible. If I don't make headway on alternate approaches, I do prefer this to my current solution which utilizes separate copy activities for each index. – TylerSimpson Jun 26 '23 at 19:37
0

Azure SQL DB can read directly from blob storage using OPENROWSET. See the start of the technique here.

For your particular piece of XML, manipulate it with CROSS APPLY to drill deeper into it:

DECLARE @xml XML = '<LoanExport>
  <Application>
    <GeneratedDate>06/15/2023</GeneratedDate>
    <BusinessApplication/>
    <ApplicationNumber>132</ApplicationNumber>
    <ApplicationType>Joint</ApplicationType>
    <Business>false</Business>
    <Applicants>
      <Applicant>
        <Type>primary</Type>
        <FirstName>Robert</FirstName>
        <LastName>Ice</LastName>
        <MiddleName/>
        <NameSuffix/>
        <DateOfBirth>02/22/1992</DateOfBirth>
        <CreditScore/>
        <SocialSecurityNumber>666423221</SocialSecurityNumber>
        <IsUSCitizen>true</IsUSCitizen>
        <PhoneNumber>5555555555</PhoneNumber>
        <WorkPhoneNumber/>
        <CellularPhoneNumber>3334445555</CellularPhoneNumber>
        <EmailAddress>demomm91zft@tcicredit.com</EmailAddress>
        <DriversLicense/>
        <DriversLicenseState/>
        <DriversLicenseExpirationDate/>
        <DriversLicenseIssueDate/>
        <MaritalStatus>false</MaritalStatus>
        <Addresses>
          <Address>
            <Type>Primary</Type>
            <Address>
              <Number>1</Number>
              <StreetPrefix/>
              <StreetName>Hawk Drive</StreetName>
              <StreetType/>
              <StreetSuffix/>
              <ApartmentOrSuiteNumber/>
              <PostalCode>60750</PostalCode>
              <City>FANTASY ISLAND</City>
              <State>MD</State>
            </Address>
            <TimeAtResidenceYears>5</TimeAtResidenceYears>
            <TimeAtResidenceMonths>7</TimeAtResidenceMonths>
            <Ownership>Own</Ownership>
            <MonthlyPayment currency="USD">550.00</MonthlyPayment>
          </Address>
        </Addresses>
        <Employments>
          <Employment>
            <isCurrentEmployment>true</isCurrentEmployment>
            <Status>Employed</Status>
            <EmployerName>TCI</EmployerName>
            <ContactName/>
            <ContactPhone>1111111111</ContactPhone>
            <JobPosition>Business Analyst</JobPosition>
            <Address>
              <StreetName/>
              <PostalCode/>
              <City/>
              <State/>
            </Address>
            <TimeAtEmployerYears>10</TimeAtEmployerYears>
            <TimeAtEmployerMonths>6</TimeAtEmployerMonths>
            <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
          </Employment>
        </Employments>
        <DenialReasons>
          <DenialReason>
            <Reason>Applicant has no SS number</Reason>
          </DenialReason>
        </DenialReasons>
      </Applicant>
      <Applicant>
        <Type>coBorrower</Type>
        <FirstName>asd</FirstName>
        <LastName>asd</LastName>
        <MiddleName>asd</MiddleName>
        <NameSuffix>IV</NameSuffix>
        <DateOfBirth>07/17/1973</DateOfBirth>
        <CreditScore/>
        <SocialSecurityNumber>333333333</SocialSecurityNumber>
        <IsUSCitizen>true</IsUSCitizen>
        <PhoneNumber>1231231231</PhoneNumber>
        <WorkPhoneNumber/>
        <CellularPhoneNumber>1231231231</CellularPhoneNumber>
        <EmailAddress>dddd@yahoo.com</EmailAddress>
        <DriversLicense/>
        <DriversLicenseState/>
        <DriversLicenseExpirationDate/>
        <DriversLicenseIssueDate/>
        <MaritalStatus>false</MaritalStatus>
        <Addresses>
          <Address>
            <Type>Primary</Type>
            <Address>
              <Number/>
              <StreetPrefix/>
              <StreetName>asdasd</StreetName>
              <StreetType/>
              <StreetSuffix/>
              <ApartmentOrSuiteNumber/>
              <PostalCode>12345</PostalCode>
              <City>Schenectady</City>
              <State>NY</State>
            </Address>
            <TimeAtResidenceYears>17</TimeAtResidenceYears>
            <TimeAtResidenceMonths>11</TimeAtResidenceMonths>
            <Ownership>Living with relatives</Ownership>
            <MonthlyPayment currency="USD">2222.00</MonthlyPayment>
          </Address>
        </Addresses>
        <Employments>
          <Employment>
            <isCurrentEmployment>true</isCurrentEmployment>
            <Status>Retired</Status>
            <EmployerName/>
            <ContactName/>
            <ContactPhone/>
            <JobPosition/>
            <Address>
              <StreetName/>
              <PostalCode/>
              <City/>
              <State/>
            </Address>
            <TimeAtEmployerYears>0</TimeAtEmployerYears>
            <TimeAtEmployerMonths>0</TimeAtEmployerMonths>
            <GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
          </Employment>
        </Employments>
        <DenialReasons>
          <DenialReason>
            <Reason>Applicant has no SS number</Reason>
          </DenialReason>
        </DenialReasons>
      </Applicant>
    </Applicants>
  </Application>
</LoanExport>'

--SELECT @xml;

SELECT
    l.c.value('(ApplicationNumber/text())[1]', 'INT') AS ApplicationNumber,
    l.c.value('(GeneratedDate/text())[1]', 'DATE') AS GeneratedDate,
    a.c.value('(FirstName/text())[1]', 'VARCHAR(10)') AS FirstName,
    a.c.value('(LastName/text())[1]', 'VARCHAR(10)') AS LastName,

    l.c.query('.') l,
    l.c.query('.') a

FROM @xml.nodes('LoanExport/Application') l(c)
    CROSS APPLY l.c.nodes('Applicants/Applicant') a(c);
wBob
  • 13,710
  • 3
  • 20
  • 37
  • a true legend, thank you! I was not aware this was possible in Azure SQL. Marking this as the answer as it was exactly the approach I was hoping for. – TylerSimpson Jul 04 '23 at 17:01