2

I've got a Linq to XML query here, but in the XML document that i'm querying there are a lot of duplicate element values being returned. This hasn't caused a problem so far as my code filters out duplicates later on before it builds a treeview with the data.

However, i've noticed that my program is slowing down significantly if there are many many system and subsystem elements with the same values. This is becuse my program runs a load of code before it filters out the duplicate entries. I think it would be far more efficient to filter them out at the Linq stage. The only problem with this is that i have no idea how to.

A sample of my xml is shown below, and here is my Linq query:

XDocument doc = XDocument.Load(CSDBpath + projectName + "\\Data.xml");

                    var subsys = from sub in doc.Descendants("dataModule")
                                 where sub.Descendants("system").First().Value == sys
                                 select sub.Descendants("subsystem").First().Value;

                    foreach (var mysub in subsys)
                    {

                        buildSubSystemNodes(sys, mysub);
                        getUnits(sys, mysub); 
                    } 

So at the moment there could be hundreds of duplicated 'subsys' variables colected from the linq query. I need to filter these out before my foreach loop.

Heres a extract of the xml file. As you can see, all three of these entries all have the same Sys, Subsys, and Subsubsys element values. Sometimes there are hundreds the same. I need to remove the duplicates. Please help!!

    <DMs>
      <dataModule>
        <DMC>DMC-PO-A-32-11-00-00A-00BA-C_001.SGM</DMC>
        <techName>Main Landing Gear</techName>
        <infoName>List of support equipment (normally used in front matter)</infoName>
        <modelic>PO</modelic>
        <system>32</system>
        <subsystem>11</subsystem>
        <subsubsystem>00</subsubsystem>
        <status>Checked In</status>
        <notes>-</notes>
        <currentUser>-</currentUser>
        <validator>-</validator>
        <dateMod>-</dateMod>
        <size>-</size>
      </dataModule>
      <dataModule>
        <DMC>DMC-PO-A-32-11-00-00A-00CA-C_001.SGM</DMC>
        <techName>Main Landing Gear</techName>
        <infoName>List of supplies (normally used in front matter)</infoName>
        <modelic>PO</modelic>
        <system>32</system>
        <subsystem>11</subsystem>
        <subsubsystem>00</subsubsystem>
        <status>Checked In</status>
        <notes>-</notes>
        <currentUser>-</currentUser>
        <validator>-</validator>
        <dateMod>-</dateMod>
        <size>-</size>
      </dataModule>
      <dataModule>
        <DMC>DMC-PO-A-32-11-00-00A-005A-C_001.SGM</DMC>
        <techName>Main Landing Gear</techName>
        <infoName>Lists of abbreviations</infoName>
        <modelic>PO</modelic>
        <system>32</system>
        <subsystem>11</subsystem>
        <subsubsystem>00</subsubsystem>
        <status>Checked In</status>
        <notes>-</notes>
        <currentUser>-</currentUser>
        <validator>-</validator>
        <dateMod>-</dateMod>
        <size>-</size>
      </dataModule>
      <dataModule>
</DMs>
Daedalus
  • 539
  • 2
  • 6
  • 16
  • Are you sure, that performance bottleneck is not at the `buildSubSystemNodes` or `getUnits` ? – Ilya Ivanov May 16 '13 at 16:55
  • check this question: http://stackoverflow.com/questions/4085065/xml-linq-removing-duplicate-nodes-in-xelement-c-sharp – hopper May 16 '13 at 16:57
  • @IlyaIvanov - i've been trying to debug the code all afternoon, and the only difference i can see between the XML file that the above extract came from, and another one which works with no performance issues, is the fact that this one has many duplicate entries. That is the only difference. The other one of similar size runs through my code and generates my treeView in about 4 seconds. The XML file that the above was taken from takes up to 3 minutes. I'm lost as to what else it can possibly be. Thanks for the code below, i'll try it tomorrow and mark it accordingly. – Daedalus May 16 '13 at 17:23

1 Answers1

1

Try to use next code snippet, it should be more robust. But again - consider more carefully what is the main reason of program slow execution.

var subsys = doc.Descendants("dataModule")
                .Where(data => data.Element("system").Value == sys)
                .Select(data => data.Element("subsystem").Value)
                .Distinct();

foreach (var mysub in subsys)
{
    buildSubSystemNodes(sys, mysub);
    getUnits(sys, mysub); 
}

note: I've removed opening <dataModule> tag from your xml at the end, it doesn't have closing one

Ilya Ivanov
  • 23,148
  • 4
  • 64
  • 90
  • Lvanov - This worked!! It now loads in about 3 seconds as opposed to about 3 minutes. I'm a little confused as to why it makes SUCH a difference, but it does. Thanks a lot. – Daedalus May 17 '13 at 07:24