0

I am working on importing an XML file from the internet into my MySQL database and I am running into problems because it contains some multivariable attributes. For example, there may be 1 "category" tag per item or 3. In database relations, this attribute should form its own table, but I am not sure how to connect things like that. Below is a shortened example of what I am dealing with.

<Library>
    <Book>
        <Author> Dave </Author>
        <Title> XML Help </Title>
        <Category> Computers </Category>
        <Category> XML </Category>
    </Book>
</Library>

I am aware of the basic syntax as below

LOAD XML LOCAL INFILE 'file.xml' INTO TABLE table ROWS IDENTIFIED BY '<Value>';

This assumes that there is only a single value for each attribute. I cannot edit the xml file because it is hundreds of thousands of lines long and I am looking to automate this process anyway. Thank you for your help.

Shadow
  • 33,525
  • 10
  • 51
  • 64
  • have you checked the [manual](https://dev.mysql.com/doc/refman/8.0/en/load-xml.html) – nbk Nov 27 '21 at 20:05
  • 1
    LOAD XML cannot load such file without additional processing. – Akina Nov 27 '21 at 20:35
  • Load xml can only be used to populate a single table at a time. Using the above xml file, you can create a books and a categories table through two runs , but it will not be able to populate the junction table between the two tables. You need to process the xml file with an external programming language. – Shadow Nov 27 '21 at 22:29

1 Answers1

0

Consider transforming your XML with XSLT, a declarative and special-purpose language like SQL, used specifically to transform XML documents. And since the mysql CLI can run shell commands using system or \!, you can call an installed XSLT processor at command line or run a prepared (and compiled) general-purpose language (Java, Python, PHP, etc.) script at command line.

Assuming you need to transform original input to the following where distinct categories are split into different <Book> nodes.

<?xml version="1.0" encoding="UTF-8"?>
<Library>
   <Book>
      <Author>Dave</Author>
      <Title>XML Help</Title>
      <Category>Computers</Category>
   </Book>
   <Book>
      <Author>Dave</Author>
      <Title>XML Help</Title>
      <Category>XML</Category>
   </Book>
</Library>

Run below XSLT. See Online Demo.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>
    
    <xsl:template match="Book">
       <xsl:apply-templates select="Category"/>
    </xsl:template>
    
    <xsl:template match="Category">
     <Book>
       <xsl:apply-templates select="preceding-sibling::Author"/>
       <xsl:apply-templates select="preceding-sibling::Title"/>
       <xsl:copy>
        <xsl:apply-templates select="node()"/>
       </xsl:copy>
     </Book>
    </xsl:template>

    <xsl:template match="text()">
       <xsl:apply-templates select="normalize-space()"/>
    </xsl:template>
    
</xsl:stylesheet>

Then, call shell commands from the mysql CLI:

  • With Unix's xsltproc

    mysql> system xsltproc myScript.xsl Input.xml > Output.xml
    
  • With Windows' System.Xml.Xsl (see Powershell script here)

    mysql> system Powershell.exe -File "PS_Script.ps1" "Input.xml" "myScript.xsl" "Output.xml"
    
  • Call general-purpose languages (see XSLT scripts here):

    mysql> system java run_xslt
    mysql> system python run_xslt.py
    mysql> system php run_xslt.php
    mysql> system perl run_xslt.pl
    mysql> system Rscript run_xslt.R
    

Finally, run LOAD XML using transformed document:

LOAD XML LOCAL INFILE 'myTranformedOutput.xml' 
  INTO TABLE mytable 
  ROWS IDENTIFIED BY '<Book>';
Parfait
  • 104,375
  • 17
  • 94
  • 125