2

I've a large xml file and need to create a partial 'copy' of this file. (Using C#) I need to keep the same xml structure, but only need the parts where a condition is 'true'.

An example structure:

<?xml version="1.0" encoding="utf-8"?>
<country name="Netherlands">
    <region name="NH">
        <city name="Aalsmeer">
            <district name="Some district">
            <part type="slum" />
            </district>
            <contact adres="StreetName" telephone="0000000000" valid="false" />
            <residents number="10000" />
            <homes number = "7000" />
        </city>
        <city name="Heemskerk">
            <district name="Some district">
                <part type="workersDistrict" />
            </district>
            <contact adres="StreetName" telephone="0000000000" valid="true" />
            <residents number="10000" />
            <homes number = "7000" />
        </city>
        </region>
        <region name="ZH">
            <city name="Rotterdam">
                <district name="Some district">
                <part type="workersDistrict" />
                </district>
                <contact adres="StreetName" telephone="0000000000" valid="true" />
                <residents number="10000" />
                <homes number = "7000" />
            </city>
            <city name="Moerdijk">
                <district name="Some district">
                    <part type="residential area" />
                </district>
                <contact adres="StreetName" telephone="0000000000" valid="false" />
                <residents number="10000" />
                <homes number = "7000" />
            </city>
            </region>
</country>

I only need the 'city' elements where the attribute 'valid' is 'true'. The new XML file should look like this:

<?xml version="1.0" encoding="utf-8"?>
    <country name="Netherlands">
        <region name="NH">
            <city name="Heemskerk">
                        <district name="Some district"
                    <part type="workersDistrict" />
                </district>
                <contact adres="StreetName" telephone="0000000000" valid="true" />
                <residents number="10000" />
                <homes number = "7000" />
            </city>
        </region>
        <region name="ZH">
            <city name="Rotterdam">
                <district name="Some district"
                    <part type="workersDistrict" />
                </district>
                <contact adres="StreetName" telephone="0000000000" valid="true" />
                <residents number="10000" />
                <homes number = "7000" />
            </city>
        </region>
    </country>

How do I get this done as quickly as possible (taking into account the number of city elements (e.g. 100630) and the file size (e.g. 63,0 MB)?

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
John Doe
  • 9,843
  • 13
  • 42
  • 73

2 Answers2

2

Using a modified identity transform, you can simply match on the elements that you want to suppress and provide an empty template.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output indent="yes" />

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--suppress any city elements that do not have contact/@valid='true' -->
    <xsl:template match="city[not(contact/@valid='true')]" />

</xsl:stylesheet>

You can execute the XSLT in C# like this:

using System;
using System.Xml;
using System.Xml.Xsl; 
namespace XSLTransformation
{
    /// Summary description for Class1.
    class Class1
    {
        static void Main(string[] args)
        {
            XslTransform myXslTransform; 
            myXslTransform = new XslTransform();
            myXslTransform.Load("books.xsl"); 
            myXslTransform.Transform("books.xml", "ISBNBookList.xml"); 

        }
    }
}
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • This sounds very interesting. How do i use this within C# ? (i'm not very experienced ;-( ) – John Doe Oct 14 '11 at 17:03
  • I've updated the answer with sample code to invoke the transform in C#, and linked to the page it was copied from. – Mads Hansen Oct 14 '11 at 17:23
  • I tried the example but this is my result; So without the matching city elements. Something wrong with the xsl file ? – John Doe Oct 14 '11 at 17:30
  • Whoops! Sorry about that. I've updated the answer with the corrected XPath. – Mads Hansen Oct 14 '11 at 18:21
  • I received a warning: 'System.Xml.Xsl.XslTransform' is obsolete: 'This class has been deprecated. Please use System.Xml.Xsl.XslCompiledTransform instead. http://go.microsoft.com/fwlink/?linkid=14202' Whn i changed 'XslTransform' to 'XslCompiledTransform' the new xml file contains a lot of empty lines (i guess from the city elements that do not have contact/@valid='true') 'XslTransform' works fine, but the advise is to use 'XslCompiledTransform'. How can i remove the empty lines ? – John Doe Oct 15 '11 at 06:41
  • Finally if found a solution. I added ****** after ****** to my stylesheet. All unnecessary whitespace is now removed. I'am not sure this is the best solution. But it works ;-)) – John Doe Oct 15 '11 at 10:37
0

I suggest you check out linq to xml:

Example: http://jesseliberty.com/2011/02/15/linq-to-xml/

MSDN: http://msdn.microsoft.com/en-us/library/bb387098.aspx

  • I tried linq to xml. As a result i got: "" How can i get the other elements to write to a new file (country-region-city) In other words, how do i get the same result in my final xml file (as in my example) My code: XDocument myFile = XDocument.Load(xmlFile); var query = from c in myFile.Descendants("contact") where c.Attribute("valid").Value == "true" select c; foreach (var elem in query) { MessageBox.Show(elem.ToString()); } – John Doe Oct 14 '11 at 16:46