1

I'm new in DBAdministration and I have been asked to design the db structure starting from big (8Gb) xml files.

I'm testing the importing of the data from the XML into the tables, but I cannot fill more than one field at time. For the others I get null results.

Here the code:

CREATE TABLE test
(d_name     CHAR(100),
 f_name     CHAR(100),
 w       CHAR(100))

ORGANIZATION EXTERNAL
  (TYPE ORACLE_LOADER
  DEFAULT DIRECTORY DIR
 ACCESS PARAMETERS
    (
   RECORDS DELIMITED BY NEWLINE 
   NOBADFILE NODISCARDFILE NOLOGFILE
   FIELDS MISSING FIELD VALUES ARE NULL        


   REJECT ROWS WITH ALL NULL FIELDS
            (
            d_name      CHAR(100) ENCLOSED BY '<d_name>' AND '</d_name>',
            f_name         CHAR(100) ENCLOSED BY '<f_name>' AND '</f_name>',
            w      CHAR(100) ENCLOSED BY '<w>' AND '</w>'
            )
    )
  LOCATION ('xmlFILE.xml')
 )
   reject limit unlimited;

Here what I get:

d_name|f_name|w
John|null|null
Peter|null|null

Can anybody help me to understand where is the mistake?

Thanks a lot!

EDIT the XML starts like this:

<REC r_id_disclaimer="">
<UID>yyyyyyyyy</UID>
<static_data>
<summary>
<EWUID>
<WUID coll_id="WOS"/>
<edition value="WOS.AHCI"/>
</EWUID>
<pub_info coverdate="DEC 7 2014" has_abstract="N" issue="49" pubmonth="DEC 7" pubtype="Journal" pubyear="2014" sortdate="2014-12-07" special_issue="SI" vol="119">
<page begin="59" end="59" page_count="1">59-59</page>
</pub_info>
<titles count="1">
<title type="source">Title</title>
</titles>
<names count="2">
<name role="author" seq_no="1">
<d_name>John</d_name>
<f_name>John, RICHARD </f_name>
<w>jOHN, MI</w>
</name>
<name role="author" seq_no="2">
<d_name>Peter</d_name>
<f_name>Peter, RICHARD </f_name>
<w>Pet, MI</w>
</name>
</names>
newinIT
  • 15
  • 1
  • 11
  • Showing the content of `xmlFILE.xml` might be helpful for context. Why aren't you loading the data as an XMLType, and then extracting the contents as relational data using the built-in tools? – Alex Poole Aug 12 '15 at 09:07
  • Dear Alex, could you give me more information about the built-in tool? As my first try I was using the XMLTYPE, but without success... please see the other topic [link](http://stackoverflow.com/questions/31409638/importing-xml-data-in-oracle-db) – newinIT Aug 12 '15 at 09:40

1 Answers1

0

The fields are in different physical records, because of the newlines. With SQL*Loader you can concatenate or continueif to assemble logical records from physical records, but I don't think the external table driver supports that. Or loading XMLType directly.

You could load it via a CLOB field. Create a data file in the same DIR directory which has the real XML file name in it: say xmlfiles.txt which just contains the value xmlFILE.xml (but could have multiple rows, each referring to a different XML file). Then your external table would be:

create table test (xml_clob clob)
organization external
(
  type oracle_loader
  default directory dir
  access parameters
  (
    records delimited by newline
    nobadfile nodiscardfile nologfile
    fields missing field values are null        
    reject rows with all null fields
    (
      xml_filename char(80)
    )
    column transforms (xml_clob from lobfile (xml_filename) from (dir) clob)
  )
  location ('xmlfiles.txt')
)
reject limit unlimited;

Notice that the location is now the new file containing the real file's name; and the column transforms clause is using the real file name - from the xml_filename extracted from the new file - to actually load the CLOB data. You can't have a CLOB directly in the loader data file, it has to be in a separate file that is referenced.

The test table now has that XML file stored as a CLOB, and you can convert it to an XMLType to manipulate it, e.g. as part of an XMLTable clause:

select x.*
from test t
cross join xmltable('/REC/static_data/summary/names/name'
  passing xmltype(t.xml_clob)
  columns d_name varchar2(20) path 'd_name',
    f_name varchar2(20) path 'f_name',
    w varchar2(20) path 'w'
) x;

D_NAME               F_NAME               W                  
-------------------- -------------------- --------------------
John                 John, RICHARD        jOHN, MI            
Peter                Peter, RICHARD       Pet, MI             

The xmltype(t.xml_clob) is doing the conversion, so the XPath is applied to an XML document formed from the CLOB value in the table.


There is a section of the documentation that addresses loading XML content. If you get it into the DB as XML then you can manipulate it to create relational data.

Using the mechanism you tried in your previous question, and repeating some parts of that answer, you can create the staging table with an XMLType column and load from a bfile:

create table test (xml_file xmltype)
xmltype xml_file store as securefile binary xml;

insert into test (xml_file) 
select xmltype(bfilename('DIR', 'xmlFILE.xml'), nls_charset_id('WE8ISO8859P1'))
from dual;

You can then use an XMLTable clause again to get the data in a relational form:

select x.*
from test t
cross join xmltable('/REC/static_data/summary/names/name'
  passing t.xml_file 
  columns seq_no number path '@seq_no',
    d_name varchar2(20) path 'd_name',
    f_name varchar2(20) path 'f_name',
    w varchar2(20) path 'w'
) x;

    SEQ_NO D_NAME               F_NAME               W                  
---------- -------------------- -------------------- --------------------
         1 John                 John, RICHARD        jOHN, MI            
         2 Peter                Peter, RICHARD       Pet, MI             

I've included the seq_no just to show how to get an attribute value.

The XPath has to match your real file; I added the closing tags for summary, static_data and REC to make it valid, but if you also have an enclosing records level (as shown in the previous question) then you'd need to include that in the XPath, and also specify the namespace:

...
cross join xmltable(xmlnamespaces(default 'http://xxxxxxxxxxxxxx'),
  '/records/REC/static_data/summary/names/name'
  passing t.xml_file
...

If you want to get data from two levels you can pull it out as shown in your comment, but not if any of the child nodes are repeated; here you have more then one name for each REC, so you'd get ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence. You could use FLWOR syntax go create new elements combining the data, but it might be simpler to extract the names node and pass that to a second XMLTable:

select x.cd_uid, y.*
from test t
cross join xmltable(xmlnamespaces(default 'http://xxxxxxxxxxxxxx'),
  '/records/REC'
  passing t.xml_file 
  columns cd_uid varchar2(10) path 'UID',
    names xmltype path 'static_data/summary/names'
) x
cross join xmltable(xmlnamespaces(default 'http://xxxxxxxxxxxxxx'),
  'names/name'
  passing x.names 
  columns seq_no number path '@seq_no',
    d_name varchar2(20) path 'd_name',
    f_name varchar2(20) path 'f_name',
    w varchar2(20) path 'w'
) y;

CD_UID         SEQ_NO D_NAME               F_NAME               W                  
---------- ---------- -------------------- -------------------- --------------------
yyyyyyyyy           1 John                 John, RICHARD        jOHN, MI            
yyyyyyyyy           2 Peter                Peter, RICHARD       Pet, MI             

The names sub-XML is extracted as x.names, and the individual elements are then pulled out of that in the y XMLTable. You can do that for multiple levels if necessary, and pull a mix of singleton and sequence data out of each, depending how complicated your XML structure is.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Thank you Alex,very helpful! I'm tryng to"merge"different fields that are in different paths,but I get null values. ' select x.* from testtable2 t cross join xmltable(xmlnamespaces(default 'http://scientific.thomsonreuters.com/schema/wok5.4/public/FullRecord'), 'records/REC' passing t.xml_file columns cd_uid varchar2(200) path 'UID', d_name varchar2(200) path '/static_data/summary/names/name/d_name', f_name varchar2(200) path '/static_data/summary/names/name/f_name', w varchar2(200) path '/static_data/summary/names/name/w' ) x;' – newinIT Aug 12 '15 at 13:00
  • @newinIT - that would sort of work if you removed the `/` in front of the path starting `/static_data/...`; but then you'd get an ORA-19279 error. I've updated the answer with an example of how to nest XMLTable calls to get the related data out together. – Alex Poole Aug 12 '15 at 14:34
  • And what about the correct code to show "Title" and "Title2" in case of TitleTitle2? I've tried the below code `LB_TITLE_SOURCE varchar2(20) path '@type=source' ` but I get "true" or "false" results. Thanks! – newinIT Aug 12 '15 at 16:16
  • @newinIT - you need the full path from REC down to the title; and you need to filter on the type value: `lb_title_source varchar2(20) path 'static_data/summary/titles/title[@type="source"]'`, which with your sample gives a result of `Title`. – Alex Poole Aug 12 '15 at 16:24
  • @newinIT - you're getting a long way from your original issue; you should ask a new question if you have further problems getting the XPath right for the data you need (referring back to this for context if it's relevant). (And preferably [mark this question as answered](http://stackoverflow.com/help/someone-answers), though that's up to you!) – Alex Poole Aug 12 '15 at 16:28