0

I'm trying to develop an efficient XML parse routine that uses a multi table insert from a data source with parent/child relationships. I want 1 parent and multiple child records inserted simultaneously. Unfortunately I'm getting the same number of parent records as I have child records.

The XML looks something like this, but with hundreds of parent elements and thousands of child elements.

<batch id="123" process_date="20171010">
    <parent id="101" name="P101" status="active">
        <child id="201" name="C201" value="xxx1" />
        <child id="202" name="C202" value="xxx2" />
        <child id="203" name="C203" value="xxx3" />
        <child id="204" name="C204" value="xxx4" />
        <child id="205" name="C205" value="xxx5" />
    </parent>
    <parent id="102" name="P102" status="active">
        <child id="211" name="C211" value="yyy1" />
        <child id="212" name="C212 value="yyy2" />
        <child id="213" name="C213" value="yyy3" />
    </parent>
    <parent id="103" name="P103" status="suspended">
        <child id="221" name="C221" value="zzz1" />
    </parent>
</batch>

These XML documents are stored in an XML column in the following table:

tblBatchUpload table (batch_upload_id int, xml xmltype)

I want to parse this xml and store it into 3 tables:

tblBatch (batch_id int, process_date varchar2(8))
tblParent (parent_id int, batch_id int, parent_name varchar2(10), parent_status varchar2(10))
tblChild (child_id int, parent_id int, child_name varchar2(10), child_value varchar2(10))

I know I can extract the batch, parent, and children from this flattened table in multiple passes, but What I really want to do is a multi table insert against this data in one pass. The problem is that my multi-table insert is inserting the same number of batch and parent records as child records when all I want is one batch record and unique parent records.

Here's my attempt at the query:

INSERT ALL
    -- always insert the child record
    WHEN 1=1 THEN
        INTO tblChild (child_id, parent_id, child_name, child_value) 
            VALUES (child_id, parent_id, child_name, child_value)
    -- only insert parents that don't already exist
    WHEN NOT EXISTS (SELECT * FROM tblParent A WHERE A.parent_id = parent_id) THEN
        INTO tblParent (parent_id, batch_id, parent_name, parent_status)
            VALUES (parent_id, batch_id, parent_name, parent_status)
    -- only insert batches that don't already exist
    WHEN NOT EXISTS (SELECT * FROM tblBatch A WHERE A.batch_id = batch_id) THEN 
        INTO tblBatch (batch_id, process_date) 
            VALUES (batch_id int, process_date)
SELECT
    t.batch_upload_id
    b.batch_id, 
    b.process_date,
    p.parent_id,
    p.parent_name,
    p.parent_status,
    c.child_id,
    c.child_name,
    c.child_value
FROM
    tbl_batch_upload t,
    XMLTABLE ( '/batch' passing t.xml 
        columns batch_id int path '@id', 
            process_date varchar2(8) path '@process_date', 
            parents XMLTYPE PATH 'Parent') b,
    XMLTABLE ( '/parent' passing b.parents 
        columns parent_id int path '@id', 
            parent_name varchar2(10) path '@name', 
            parent_status varchar2(10) path '@status', 
            children XMLTYPE PATH 'child') p
    XMLTABLE ( '/child' passing p.children 
        columns child_id int path '@id', 
            child_name varchar2(10) path '@name', 
            child_value varchar2(10) path '@value') c
WHERE
    t.batch_upload_id = :p_batch_upload_id;

How can I only insert unique batch records and unique parent records while only parsing the XML once?

Lee Greco
  • 743
  • 2
  • 11
  • 23

1 Answers1

0

I found a solution. I had to add FOR ORDINALITY columns to the XMLTABLE and base the conditional inserts for batch and parent on the ordinality of the child row:

INSERT ALL
    -- always insert the child record
    WHEN 1=1 THEN
        INTO tblChild (child_id, parent_id, child_name, child_value) 
            VALUES (child_id, parent_id, child_name, child_value)
    -- first child implies a new parent
    WHEN child_ordinal = 1 /* AND NOT EXISTS (SELECT * FROM tblParent A WHERE A.parent_id = parent_id) */ THEN
        INTO tblParent (parent_id, batch_id, parent_name, parent_status)
            VALUES (parent_id, batch_id, parent_name, parent_status)
    -- first parent first child implies a new batch
    WHEN parent_ordinal = 1 AND child_ordinal = 1 /* AND NOT EXISTS (SELECT * FROM tblBatch A WHERE A.batch_id = batch_id) */ THEN 
        INTO tblBatch (batch_id, process_date) 
            VALUES (batch_id int, process_date)
SELECT
    t.batch_upload_id
    b.batch_id, 
    b.process_date,
    p.parent_ordinal,
    p.parent_id,
    p.parent_name,
    p.parent_status,
    c.child_ordinal,
    c.child_id,
    c.child_name,
    c.child_value
FROM
    tbl_batch_upload t,
    XMLTABLE ( '/batch' passing t.xml 
        columns batch_id int path '@id', 
            process_date varchar2(8) path '@process_date', 
            parents XMLTYPE PATH 'Parent') b,
    XMLTABLE ( '/parent' passing b.parents 
        columns 
            parent_ordinal FOR ORDINALITY,
            parent_id int path '@id', 
            parent_name varchar2(10) path '@name', 
            parent_status varchar2(10) path '@status', 
            children XMLTYPE PATH 'child') p
    XMLTABLE ( '/child' passing p.children 
        columns
            child_ordinal FOR ORDINALITY, 
            child_id int path '@id', 
            child_name varchar2(10) path '@name', 
            child_value varchar2(10) path '@value') c
WHERE
    t.batch_upload_id = :p_batch_upload_id;
Lee Greco
  • 743
  • 2
  • 11
  • 23