8

MySQL has a nice statement: LOAD XML LOCAL INFILE

For example, if you have this table:

 CREATE TABLE person (
    person_id INT NOT NULL PRIMARY KEY,
    fname VARCHAR(40) NULL,
    lname VARCHAR(40) NULL
 );

and the following XML file called person.xml:

<list>
      <person>
          <person_id>1</person_id>
          <fname>Mikael</fname>
          <lname>Ronström</lname>
      </person>
      <person>
          <person_id>2</person_id>
          <fname>Lars</fname>
          <lname>Thalmann</lname>
      </person>
</list>

You can do this:

LOAD XML LOCAL INFILE 'person.xml'
INTO TABLE person
ROWS IDENTIFIED BY '<person>';

My question is, what if the column names were different in the XML file than they are in the table? For example:

<list>
      <person>
          <PersonId>1</PersonId>
          <FirstName>Mikael</FirstName>
          <LastName>Ronström</LastName>
      </person>
      <person>
          <PersonId>2</PersonId>
          <FirstName>Lars</FirstName>
          <LastName>Thalmann</LastName>
      </person>
</list>

How can you accomplish the same thing with a MySQL statement without manipulating the XML file? I searched everywhere but couldn't find an answer.

stepanian
  • 11,373
  • 8
  • 43
  • 63

5 Answers5

12

The fields in the XML file that don't correspond to physical column names are ignored. And columns in the table that don't have corresponding fields in the XML are set NULL.

What I'd do is load into a temp table as @Kolink suggests but with additional columns. Add a SET clause as you load the data from XML.

CREATE TEMP TABLE person_xml LIKE person;

ALTER TABLE person_xml 
  ADD COLUMN FirstName VARCHAR(40),
  ADD COLUMN LastName  VARCHAR(40),
  ADD COLUMN PersonId  INT;

LOAD XML LOCAL INFILE 'person.xml' INTO TABLE person_xml
  SET person_id = PersonId, fname = FirstName, lname = LastName;

SELECT * FROM person_xml;
+-----------+--------+-------------+-----------+-------------+----------+
| person_id | fname  | lname       | FirstName | LastName    | PersonId |
+-----------+--------+-------------+-----------+-------------+----------+
|         1 | Mikael | Ronström    | Mikael    | Ronström    |        1 |
|         2 | Lars   | Thalmann    | Lars      | Thalmann    |        2 |
+-----------+--------+-------------+-----------+-------------+----------+

Then copy to the real table, selecting a subset of columns.

INSERT INTO person SELECT person_id, fname, lname FROM person_xml;

Alternatively, drop the extra columns and use SELECT *.

ALTER TABLE person_xml 
  DROP COLUMN PersonId, 
  DROP COLUMN FirstName, 
  DROP COLUMN LastName;

INSERT INTO person SELECT * FROM person_xml;

SELECT * FROM person;

+-----------+--------+-------------+
| person_id | fname  | lname       |
+-----------+--------+-------------+
|         1 | Mikael | Ronström    |
|         2 | Lars   | Thalmann    |
+-----------+--------+-------------+
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks. A new problem emerged. Apparently, this statement doesn't like empty elements in the form . It requires separate open and close tags. I am not sure if there is a way around that. – stepanian Dec 24 '11 at 04:06
  • These are huge files which I don't generate myself and they have to be loaded into MySQL every 15 minutes. It would be too inefficient to manipulate them. Otherwise, I would skip this entire statement and just use SAX through Java to parse them and insert the data into MySQL. – stepanian Dec 24 '11 at 04:40
  • As you wish. I'm just saying I would use XSLT. It's faster to develop and to execute transformations against XML than any coding language like procedural language. – Bill Karwin Dec 24 '11 at 05:24
  • I think I may go with your recommendation of transforming the file first with XSLT and then using the LOAD XML statement on the transformed file. – stepanian Dec 25 '11 at 00:59
  • Again, the problem I am facing is that these XML files are too large to load into memory. XSLT transformations normally load the XML into memory first. – stepanian Dec 27 '11 at 18:56
  • It sounds like you're trying to do a poor man's replication. Is there a chance you could ask for the data to be provided in CSV format? It will be less bloated than XML and easier to bulk load with LOAD DATA LOCAL INFILE, which uses positional parameters instead of exact column matches. – Butifarra Dec 27 '11 at 22:07
  • I have absolutely no control over the XML data. If I did, I would have the elements named consistently and the empty elements appear without the minimized format. Then, I could easily use the LOAD XML statement. – stepanian Dec 27 '11 at 22:25
4

A little bit hacky but working solution using the good old LOAD DATA INFILE:

LOAD DATA LOCAL INFILE '/tmp/xml/loaded.xml'
INTO TABLE person
CHARACTER SET binary
LINES STARTING BY '<person>' TERMINATED BY '</person>'
(@person)
SET
  person_id = ExtractValue(@person:=CONVERT(@person using utf8), 'PersonId'),
  fname = ExtractValue(@person, 'FirstName'),
  lname = ExtractValue(@person, 'LastName')
;

P.S. You will probably need to additionaly play with field delimiter if the data contains commas.

newtover
  • 31,286
  • 11
  • 84
  • 89
4

The following were the options available to me:

Option 1: Create a temporary table with different field names (as suggested by the other answers). This would have been a satisfactory approach. However, when I tried it, a new problem emerged: the LOAD XML statement does not, for some reason, accept minimized format empty elements (for example <person />). So, the statement failed because the XML files I need to load occasionally have empty elements in that format.

Option 2: Transform the XML file with XSLT before running the LOAD XML statement to change the element names and modify the empty element formats. This was not feasible because the XML files are very large and XSLT processing engines load the entire XML into memory before processing.

Option 3: Bypass the LOAD XML statement entirely and use a SAX parser to parse the XML file and insert the records directly into the database using JDBC and prepared statements. Even though raw JDBC and prepared statements are generally efficient, this proved to be too slow. MUCH slower than the LOAD XML statement.

Option 4: Use the LOAD DATA statement instead of the LOAD XML statement and play around with the optional clauses associated with that statement to fit my needs (e.g. lines separated by, etc.). This could have worked but would have been error prone and unstable.

Option 5: Parse the file with a fast forward-only parser and read/write XML elements simultaneously and generate a new XML file with the modified names in the desired format for the LOAD XML statement.

I ended up using option 5. I used the Java Streaming API for XML (StAX) for both reading the XML file and generating the modified XML file and then running the LOAD XML LOCAL INFILE through JDBC from inside the web application. It works perfectly and it is super fast.

stepanian
  • 11,373
  • 8
  • 43
  • 63
2

mysql table schema: organization_type(id, name)

organizationtype.xml:

<NewDataSet>
    <row>
      <ItemID>1</ItemID>
      <ItemCreatedBy>53</ItemCreatedBy>
      <ItemCreatedWhen>2014-03-10T22:53:43.947+10:00</ItemCreatedWhen>
      <ItemModifiedBy>53</ItemModifiedBy>
      <ItemModifiedWhen>2014-03-10T22:53:43.99+10:00</ItemModifiedWhen>
      <ItemOrder>1</ItemOrder>
      <ItemGUID>e2ad051f-b7ea-4feb-b91e-f558f6f632a0</ItemGUID>
      <Name>Company Type 1</Name>
    </row>

and the mysql import query will look like this:

LOAD XML INFILE '/var/lib/mysql-files/organizationtype.xml'
INTO TABLE organization_type (@ItemID, @Name) 
SET id=@ItemID, name=@Name
Gustik
  • 313
  • 1
  • 2
  • 7
2

You could create a temporary table using the column names from the XML file (although this would have to be done manually in the create temporary table query), load the XML file into that table, then insert into person select * from tmp_table_name.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • This is a very good work around. However, as I commented on Bill Karwin's answer, there is another problem with the LOAD XML statement. It doesn't accept minimized empty tags such as . Can you think of a solution to that issue? – stepanian Dec 27 '11 at 19:05
  • I have never worked with `LOAD XML`, so I just went with an educated guess on this answer. So no, unfortunately I don't know of a solution to this issue, other than loading the XML file in another language (eg. PHP), parseing it, then sending the resulting query to the database. – Niet the Dark Absol Dec 27 '11 at 19:10
  • 1
    FYI the failure to load tags was a bug and has been since fixed: _Prior to MySQL 5.5.46, LOAD XML did not handle empty XML elements in the form correctly. (Bug #67542, Bug #16171518)_. From [here](https://dev.mysql.com/doc/refman/5.5/en/load-xml.html) – radman May 08 '17 at 05:50