0

Please could someone help me? I have researched other posts (such as efficiently removing duplicate xml elements in c#) on how to remove duplicates in XML using c# and altered them to solve my problem all to no avail. I'm not very experienced in XML and all I wish to do is remove the duplicates from the following XML.

I've inherited this code and can't change the structure.

Many thanks to anyone that can help.

<Request>
    <Type>Delete</Type>
    <Client>
        <ClientId></ClientId>
        <Assignment>
            <AssignmentId></AssignmentId>
            <Assessments>
                <AssessmentId>664449ba-21b9-e511-999d-d8fc934939fe</AssessmentId>
                <AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>   
                <AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>       
                <AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
            </Assessments>
        </Assignment>
    </Client>
</Request>
Community
  • 1
  • 1
DJ811
  • 5
  • 5
  • "Can't change the structure" because you're not familiar with it? Or can't change it because you don't have permission/access? I don't see how we can help either way. You need to either learn by acquainting yourself with the code, or get permission. –  Jan 13 '16 at 15:13
  • 1
    Do you want to remove the duplicate from your file, or from an object? – Hanlet Escaño Jan 13 '16 at 15:19
  • Thanks for taking the time to read and reply. I have access but lack linq/xml skills I have tried the following code to remove the duplicate nodes but with no success. XDocument xDoc = XDocument.Parse(xmlString); xDoc.Root.Elements("Assessments") .SelectMany(s => s.Elements("AssessmentId") .GroupBy(g => g.Value) .SelectMany(m => m.Skip(1))).Remove(); – DJ811 Jan 13 '16 at 15:21
  • 3
    It sounds like you need three steps: 1) Extract values from XML; 2) Remove the duplicates; 3) Put the result back into the XML. Tackle each part in turn, and ask a *specific* question with where you're stuck. I'd strongly recommend using LINQ to XML, which should make this pretty trivial. – Jon Skeet Jan 13 '16 at 15:22
  • @DJ811 please include the code you use to read your file in the question :) – Hanlet Escaño Jan 13 '16 at 15:26
  • Hi Hanlet, from an object. I read from a DB not a file. The xml is created on the fly – DJ811 Jan 13 '16 at 15:28
  • HI Jon, yes I'm trying to use linq and SelectMany. There is no error with my code, just no duplicates removed either. I thought this would suffice xDoc.Root.Elements("Assessments") .SelectMany(s => s.Elements("AssessmentId") .GroupBy(g => g.Value) .SelectMany(m => m.Skip(1))).Remove(); – DJ811 Jan 13 '16 at 15:31

3 Answers3

0

I prefer to work with c# objects. So, you can deserialize this xml to objects with xml serializer. Also you can generate c# classes by xml in visual studio: Edit-> PasteSpecial-> Paste xml as classes.

Your code will look like this:

        Request request;
        var fileName = "File1.xml";
        //Parsing
        var sr = new XmlSerializer(typeof(Request));
        using (var fs = new FileStream(fileName, FileMode.Open))
        {
            request = (Request)sr.Deserialize(fs);
        }

        //Selecting distinct C# logic
        var distinctAssignments = request.Client.Assignment.Assessments.Distinct();
        request.Client.Assignment.Assessments = distinctAssignments.ToArray();

        //Saving your document
        var xmlDocument = new XmlDocument();
        using (var stream = new MemoryStream())
        {
            sr.Serialize(stream, request);
            stream.Position = 0;
            xmlDocument.Load(stream);
            xmlDocument.Save(fileName);
            stream.Close();
        }

Also you can use XSLT but it will look bit complex - https://msdn.microsoft.com/en-us/library/bb399419(v=vs.110).aspx

Mitklantekutli
  • 410
  • 1
  • 3
  • 16
  • Hi Mitklantekutli, thanks very much for taking the time to post an answer, I will investigate your method too. Kindest regards. – DJ811 Jan 13 '16 at 16:40
0

If you can change the application building the XML (it sounds ilke you can't), my preferred method would be to use a HashSet<string> to build up the Asssesments collection. If it's coming off a SQL query, use DISTINCT or GROUP BY.

If you're working with the XML itself and really just have no way to change it, LINQ to XML should work with a custom IEqualityComparer should work:

string xml = @"<Request>
    <Type>Delete</Type>
    <Client>
        <ClientId></ClientId>
        <Assignment>
            <AssignmentId></AssignmentId>
            <Assessments>
                <AssessmentId>664449ba-21b9-e511-999d-d8fc934939fe</AssessmentId>
                <AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>   
                <AssessmentId>5ea8edd4-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>865a13f8-e1b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>06439800-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>f683aa08-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>063f8012-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>16f7c329-e2b9-e511-9af1-d8fc934939fe</AssessmentId>       
                <AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>76706838-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>86194741-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
                <AssessmentId>66cf984f-e2b9-e511-9af1-d8fc934939fe</AssessmentId>
            </Assessments>
        </Assignment>
    </Client>
</Request>";

XDocument xd = XDocument.Parse(xml);
var assessments = xd.Root.Element("Client")
                         .Element("Assignment")
                         .Element("Assessments");
// get the distinct ones
var distinctEls = assessments.Elements()
                             .Distinct(new XElComparer())
                             .ToList(); // ensure we actually get the list, not just the enumerator or elements we're about to remove

// remove all children
assessments.Elements().Remove();

// add back our distinct list
assessments.Add(distinctEls);

Console.WriteLine(xd);
Console.ReadKey();

and the XElComparer:

public class XElComparer : IEqualityComparer<XElement>
{
    public bool Equals(XElement x, XElement y)
    {
        return x.Value.Equals(y.Value);
    }

    public int GetHashCode(XElement obj)
    {
        if (obj == null) return 0;

        return obj.Value.GetHashCode();
    }
}
Dan Field
  • 20,885
  • 5
  • 55
  • 71
0

You can do this with a simple (or not so simple I guess) XPath query.

XmlDocument doc = new XmlDocument();
doc.LoadFrom(xml); // xml in string form
var nodes = doc.SelectNodes("//AssessmentId[not(. = preceding-sibling::AssessmentId)]");

That will get you a list of unique assignment ID nodes which you can then use to remove all the existing nodes and add those. You could also remove the 'not' in the XPath query and then you would get a list of the duplicates which you could remove those nodes from the parent node as well.

Jetti
  • 2,418
  • 1
  • 17
  • 25