7

I have two xml files that both have the same schema and I would like to merge into a single xml file. Is there an easy way to do this?

For example,

<Root>
    <LeafA>
        <Item1 />
        <Item2 />
    </LeafA>
    <LeafB>
        <Item1 />
        <Item2 />
    </LeafB>
</Root>

+

<Root>
    <LeafA>
        <Item3 />
        <Item4 />
    </LeafA>
    <LeafB>
        <Item3 />
        <Item4 />
    </LeafB>
</Root>

= new file containing

<Root>
    <LeafA>
        <Item1 />
        <Item2 />
        <Item3 />
        <Item4 />
    </LeafA>
    <LeafB>
        <Item1 />
        <Item2 />
        <Item3 />
        <Item4 />
    </LeafB>
</Root>
Rachel
  • 130,264
  • 66
  • 304
  • 490
  • 2
    Cut and paste within a text editor? – BoltClock May 18 '11 at 12:56
  • 1
    @BoltClock I prefer a script since these xml files are generated automatically and will change frequently. Right now my small one is about 2000 lines long and contains multiple areas that need merging. – Rachel May 18 '11 at 12:58
  • What kind of script? If there's a preferred language to write this script in, you may wish to add it to the tags. – BoltClock May 18 '11 at 12:58
  • What language are we working with? - C#? XSLT? – Jon Egerton May 18 '11 at 13:00
  • @Jon Whatever is easiest... I haven't worked with XML much so I was hoping there was some easy tool out there to combine xml files. Usually I use C# – Rachel May 18 '11 at 13:01

7 Answers7

12

"Automatic XML merge" sounds like a relatively simple requirement, but when you go into all the details, it gets complex pretty fast. Merge with c# or XSLT will be much easier for more specific task, like in the answer for EF model. Using tools to assist with a manual merge can also be an option (see this SO question).

For the reference (and to give an idea about complexity) here's an open-source example from Java world: XML merging made easy

Back to the original question. There are few big gray-ish areas in task specification: when 2 elements should be considered equivalent (have same name, matching selected or all attributes, or also have same position in the parent element); how to handle situation when original or merged XML have multiple equivalent elements etc.

The code below is assuming that

  • we only care about elements at the moment
  • elements are equivalent if element names, attribute names, and attribute values match
  • an element doesn't have multiple attributes with the same name
  • all equivalent elements from merged document will be combined with the first equivalent element in the source XML document.

.

// determine which elements we consider the same
//
private static bool AreEquivalent(XElement a, XElement b)
{
    if(a.Name != b.Name) return false;
    if(!a.HasAttributes && !b.HasAttributes) return true;
    if(!a.HasAttributes || !b.HasAttributes) return false;
    if(a.Attributes().Count() != b.Attributes().Count()) return false;

    return a.Attributes().All(attA => b.Attributes(attA.Name)
        .Count(attB => attB.Value == attA.Value) != 0);
}

// Merge "merged" document B into "source" A
//
private static void MergeElements(XElement parentA, XElement parentB)
{
    // merge per-element content from parentB into parentA
    //
    foreach (XElement childB in parentB.DescendantNodes())
    {
        // merge childB with first equivalent childA
        // equivalent childB1, childB2,.. will be combined
        //
        bool isMatchFound = false;
        foreach (XElement childA in parentA.Descendants())
        {
            if (AreEquivalent(childA, childB))
            {
                MergeElements(childA, childB);
                isMatchFound = true;
                break;
            }
        }

        // if there is no equivalent childA, add childB into parentA
        //
        if (!isMatchFound) parentA.Add(childB);
    }
}

It will produce desired result with the original XML snippets, but if input XMLs are more complex and have duplicate elements, the result will be more... interesting:

public static void Test()
{
    var a = XDocument.Parse(@"
    <Root>
        <LeafA>
            <Item1 />
            <Item2 />
            <SubLeaf><X/></SubLeaf>
        </LeafA>
        <LeafB>
            <Item1 />
            <Item2 />
        </LeafB>
    </Root>");
    var b = XDocument.Parse(@"
    <Root>
        <LeafB>
            <Item5 />
            <Item1 />
            <Item6 />
        </LeafB>
        <LeafA Name=""X"">
            <Item3 />
        </LeafA>
        <LeafA>
            <Item3 />
        </LeafA>
        <LeafA>
            <SubLeaf><Y/></SubLeaf>
        </LeafA>
    </Root>");

    MergeElements(a.Root, b.Root);
    Console.WriteLine("Merged document:\n{0}", a.Root);
}

Here's merged document showing how equivalent elements from document B were combined together:

<Root>
  <LeafA>
    <Item1 />
    <Item2 />
    <SubLeaf>
      <X />
      <Y />
    </SubLeaf>
    <Item3 />
  </LeafA>
  <LeafB>
    <Item1 />
    <Item2 />
    <Item5 />
    <Item6 />
  </LeafB>
  <LeafA Name="X">
    <Item3 />
  </LeafA>
</Root>
Community
  • 1
  • 1
DK.
  • 3,173
  • 24
  • 33
1

If the format is always exactly like this there is nothing wrong with this method:

Remove the last two lines from the first file and append the second files while removing the first two lines.

Have a look at the Linux commands head and tail which can delete the first and last two lines.

Alex
  • 32,506
  • 16
  • 106
  • 171
  • There are multiple areas in the xml file to merge so this won't work. I'll expand my example to show that – Rachel May 18 '11 at 12:59
1

It's a simple XSLT transformation something like this (which you apply to document a.xml):

<xsl:variable name="docB" select="document('b.xml')"/>
<xsl:template match="Root">
  <Root><xsl:apply-templates/></Root>
</xsl:template>
<xsl:template match="Root/LeafA">
   <xsl:copy-of select="*"/>
   <xsl:copy-of select="$docB/Root/LeafA/*"/>
</xsl:template>
<xsl:template match="Root/LeafB">
   <xsl:copy-of select="*"/>
   <xsl:copy-of select="$docB/Root/LeafB/*"/>
</xsl:template>
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • I do not understand how to use xlst... can you point me to a good starting point for it? – Rachel May 18 '11 at 15:47
  • I've actually been trying to figure out xlst for the script found here: http://www2.informatik.hu-berlin.de/~obecker/XSLT/#merge but I eventually gave up and just made my own C# script. Thanks though. – Rachel May 18 '11 at 16:38
0

The way you could do it, is load a dataset with the xml and merge the datasets.

    Dim dsFirst As New DataSet()
    Dim dsMerge As New DataSet()

    ' Create new FileStream with which to read the schema.
    Dim fsReadXmlFirst As New System.IO.FileStream(myXMLfileFirst, System.IO.FileMode.Open)
    Dim fsReadXmlMerge As New System.IO.FileStream(myXMLfileMerge, System.IO.FileMode.Open)

    Try
        dsFirst.ReadXml(fsReadXmlFirst)

        dsMerge.ReadXml(fsReadXmlMerge)

        Dim str As String = "Merge Table(0) Row Count = " & dsMerge.Tables(0).Rows.Count
        str = str & Chr(13) & "Merge Table(1) Row Count = " & dsMerge.Tables(1).Rows.Count
        str = str & Chr(13) & "Merge Table(2) Row Count = " & dsMerge.Tables(2).Rows.Count

        MsgBox(str)

        dsMerge.Merge(dsFirst, True)

        DataGridParent.DataSource = dsMerge
        DataGridParent.DataMember = "rulefile"

        DataGridChild.DataSource = dsMerge
        DataGridChild.DataMember = "rule"

        str = ""
        str = "Merge Table(0) Row Count = " & dsMerge.Tables(0).Rows.Count
        str = str & Chr(13) & "Merge Table(1) Row Count = " & dsMerge.Tables(1).Rows.Count
        str = str & Chr(13) & "Merge Table(2) Row Count = " & dsMerge.Tables(2).Rows.Count

        MsgBox(str)
Robert
  • 5,278
  • 43
  • 65
  • 115
0

vimdiff file_a file_b as just one example

BeyondCompare is a favorite when I'm on windows http://www.scootersoftware.com/

Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130
  • That just shows me the differences... I want to actually merge the nodes, not have them resolve differences. – Rachel May 18 '11 at 14:14
0

I ended up using C# and created myself a script. I knew I could do it when I asked the question, but I wanted to know if there was a faster way to do this since I've never really worked with XML.

The script went along the lines of this:

var a = new XmlDocument();
a.Load(PathToFile1);

var b = new XmlDocument();
b.Load(PathToFile2);

MergeNodes(
    a.SelectSingleNode(nodePath),
    b.SelectSingleNode(nodePath).ChildNodes,
    a);

a.Save(PathToFile1);

And MergeNodes() looked something like this:

private void MergeNodes(XmlNode parentNodeA, XmlNodeList childNodesB, XmlDocument parentA)
{
    foreach (XmlNode oNode in childNodesB)
    {
        // Exclude container node
        if (oNode.Name == "#comment") continue;

        bool isFound = false;
        string name = oNode.Attributes["Name"].Value;

        foreach (XmlNode child in parentNodeA.ChildNodes)
        {
            if (child.Name == "#comment") continue;

            // If node already exists and is unchanged, exit loop
            if (child.OuterXml== oNode.OuterXml&& child.InnerXml == oNode.InnerXml)
            {
                isFound = true;
                Console.WriteLine("Found::NoChanges::" + oNode.Name + "::" + name);
                break;
            }

            // If node already exists but has been changed, replace it
            if (child.Attributes["Name"].Value == name)
            {
                isFound = true;
                Console.WriteLine("Found::Replaced::" + oNode.Name + "::" + name);
                parentNodeA.ReplaceChild(parentA.ImportNode(oNode, true), child);
            }
        }

        // If node does not exist, add it
        if (!isFound)
        {
            Console.WriteLine("NotFound::Adding::" + oNode.Name + "::" + name);
            parentNodeA.AppendChild(parentA.ImportNode(oNode, true));
        }
    }
}

Its not perfect - I have to manually specify the nodes I want merged, but it was quick and easy for me to put together and since I have almost no knowledge of XML, I'm happy :)

It actually works out better that it only merges the specified nodes since I'm using it to merge Entity Framework's edmx files, and I only really want to merge the SSDL, CDSL, and MSL nodes.

Rachel
  • 130,264
  • 66
  • 304
  • 490
  • @"manually specify the nodes" - this is expected to some extent, regardless if it's c# or XSLT. Otherwise, in the original example, it's impossible to tell if merge should produce 2 leaves with 4 nodes each or 4 leaves with 2 nodes per leaf. – DK. May 20 '11 at 13:52
  • The nodes should merge if the node definition is identical. In the above example it should produce 2 leaves with 4 nodes each since the two leaf nodes are identical. – Rachel May 20 '11 at 14:17
  • Ah, I see, thanks. (And probably *identical* means *equivalent*?) In generic form, original question is a very interesting exercise, but it looks like you've already got a solution for more specific case, where elements are identified by @Name. – DK. May 20 '11 at 20:11
  • @DK I'd be interested to know if you find an easy generic way still. I only went with this solution because I was impatient and wanted this quickly – Rachel May 21 '11 at 16:21
  • I've posted code that I've played with over the week-end. It's using pretty much same idea as yours. My point is that generic task can get pretty complex, so e.g. for EF models it would be easier to go with what you already have. – DK. May 23 '11 at 21:46
0

reposting answer from https://www.perlmonks.org/?node_id=127848

Paste following into a perl script

use strict;
require 5.000;

use Data::Dumper;
use XML::Simple;
use Hash::Merge;

my $xmlFile1 = shift || die "XmlFile1\n";
my $xmlFile2 = shift || die "XmlFile2\n";

my %config1 = %{XMLin ($xmlFile1)};
my %config2 = %{XMLin ($xmlFile2)};
my $merger = Hash::Merge->new ('RIGHT_PRECEDENT');
my %newhash = %{ $merger->merge (\%config1, \%config2) };
# XMLout (\%newhash, outputfile => "newfile", xmldecl => 1, rootname => 'config');
print XMLout (\%newhash);
LanDenLabs
  • 1,566
  • 16
  • 10