4

I'm looking for a way to import and export a list of changes to an XML data document (irregular structure; not naturally fitting a DataSet).

If I had a regular structure I would use a DataTable, and I could evaluate which records have been edited and then commit or cancel the changes, and I could also transmit a packet of the required changes.

How do I do this with XML data?

If a good answer isn't available I'm thinking my best bet would be to use a DataTable with the scheme [XPath, Value] despite the inefficient storage, and navigation difficulties.

I expect to make changes to the document (with XPath or LINQ or data-bound controls or whatever), then remember the changes and send only the changes over TCP.

Then I want to receive back another change list and apply it to the XML document. I don't want to send the entire document both for size and because I need to know and evaluate the changes being sent.

(Just to clarify: My program needs to send and receive document changes. The other end of the pipe is not based in .net, and is not part of this question.)

Alan Baljeu
  • 2,383
  • 4
  • 25
  • 40
  • In which part of your code did you get stuck? – L.B Aug 29 '12 at 20:55
  • Step 1: I'm stuck choosing what classes or methods to use. I don't even know if anything exists that does such or if I have to manually track and manage all changes. – Alan Baljeu Aug 29 '12 at 21:19
  • 1
    I've searched for stuff like this but I haven't found anything yet. Another approach could be to have an XML-diff class to compare two documents. Still don't have a good grasp on how this is normally handled when sending the complete doc/database is undesirable. – Alan Baljeu Aug 30 '12 at 15:09

7 Answers7

1

Do you need to act on this changes or just store them, if you want just to store the updated version you can use binary diff algorithms to pass the diffs between 2 xml files. And then to updated stored version with the difference. Good algorithm for this is bifdiff The C# version can be found here.

Another aproach is to use this XmlDiff class from MS

MichaelT
  • 7,574
  • 8
  • 34
  • 47
  • add a bit more.. which binary diff algorithms? how will binary vs XML comparison be better? how would you convert the binary diff's back into XML? – Jeremy Thompson Sep 05 '12 at 23:53
  • I agree with Jeremy. I know I can detect diffs in binary. I've never heard of anyone using them to merge changes. – Alan Baljeu Sep 06 '12 at 14:21
  • I don't sure if it something that is fits your scenario but think about this way, if you have 2 copies of a file in 2 different places and person change one of them all we need to do to keep the other one up to date is to send a difference between the original and the changed version. It the way DropBox operates. – MichaelT Sep 06 '12 at 14:29
  • I understand now. It makes sense but not for my context dealing with loaded objects in different programming languages. – Alan Baljeu Sep 07 '12 at 14:03
1
  1. How do you suppose to only send changes?
  2. Do you expect numerous changes or just slight changes every time?
  3. What kind of changes do you have to consider?
  4. Are you trying to maintain to copies of the same document across process boundaries?
  5. How are you going to resolve conflicting changes?
  6. Are you going to lock xml documents until changes are propagated?
  7. Are both copies independent, or one is master copy?

if you used XmlDocument events such as NodeInserted, NodeDeleted, NodeChanged you could build a list of such changes and then execute them on another copy. If total amount of changes is longer than document itself you could send document instead. Zipping xml data also helps.

other than that I do not see any other easy approach.

aiodintsov
  • 2,545
  • 15
  • 17
  • Thank you for the many questions. (1) If I know a change, it's easy to send elementelement. – Alan Baljeu Sep 07 '12 at 13:56
  • 2. < 1% of the document 3. The schema is fixed during a session, but add/remove element and change value. 4. undecided, but maybe yes. 5. server evaluates changes and adjusts and sends back to client. 6. branches will be locked. 7. one is master. – Alan Baljeu Sep 07 '12 at 14:00
  • @AlanBaljeu thanks for the reward, but it is still interesting how are you going to implement it and how it is going to work. Could your provide some update once you have things settled a bit? – aiodintsov Sep 08 '12 at 03:09
0

When you get XML data with irregular structure; not naturally fitting a DataSet and you want an Object Model to easily work with the data. You can use the XML Schema Definition Tool (Xsd.exe) with the /classes option to generate C# or VB.Net classes from an XML file.

The XSD.exe lives in :

C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin\xsd.exe
C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\xsd.exe

You run xsd.exe from the Visual Studio Command Line.
-Start
-All Programs
-Visual Studio
-Tools
-Command Line

This is the command to view all the XSD command line parameters:

xsd /?

To convert an irregular XML file (XmlResponseObject.xml) into Classes:

xsd c:\Temp\XmlResponseObject.xml /classes /language:CS /out:c:\Temp\

This will generate a csharp file with classes that represent the XML. You may want to refeactor it out into separate class files being careful about duplicate classes in the single file that are disambiguate by namespace. Either way the classes wont be the nicest looking with all the xml attributes but the good part is you can bind to them via XML. This is an example where I retrive XML via a REST webservice, xmlResponseObject is the ObjectModel of classes that fits the XML.

public interface IYourWebService
{
    XmlResponseObject GetData(int dataId);
}

public class YourWebService : IYourWebService
{
    public XmlResponseObject GetData(int dataId)      
    {
        XmlResponseObject xmlResponseObject = null;
        var url = "http://SomeSite.com/Service/GetData/" + dataId;
        try
        {
         var request = WebRequest.Create(url) as HttpWebRequest;
         if (request != null)
         {
            request.AllowAutoRedirect = true;
            request.KeepAlive = true;
            request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 1.1.4322; InfoPath.2; .NET4.0C; .NET4.0E)";
            request.Credentials = CredentialCache.DefaultNetworkCredentials;
            request.CookieContainer = new CookieContainer();
            var response = request.GetResponse() as HttpWebResponse;
            if (request.HaveResponse && response != null)
            {
                var streamReader = new StreamReader(response.GetResponseStream());
                var xmlSerializer = new XmlSerializer(typeof(XmlResponseObject));
                xmlResponseObject = (XmlResponseObject)xmlSerializer.Deserialize(streamReader);
            }
         }
        }
        catch (Exception ex)
        {
        string debugInfo = "\nURL: " + url;
        Console.Write(ex.Message + " " + debugInfo + " " + ex.StackTrace);
        }
    return xmlResponseObject;
    }
}

Given you wish to only send and receive document changes you could modify the classes with IsDirty flags. I'm sure though once you have the classes to work with, it will be dead easy to detect diff's.

Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
  • Interesting, but not what I'm asking for. I don't care to have C# classes reflect the XML data, and I don't care to lock the schema into my program. Finally, I don't see how this implements the DataTable's protocols for offline data to detect and examine changes. – Alan Baljeu Sep 03 '12 at 18:23
  • Thanks for the IsDirty suggestion. It lead me to find new ideas. Still searching though. – Alan Baljeu Sep 03 '12 at 18:38
  • You might need to roll your own XMLDataAdapter. You could look at the source code for the DbDataAdapter for ideas. You would need to use an awesome [XML Diff tool](http://stackoverflow.com/questions/1871076/are-there-any-free-xml-diff-merge-tools-available). – Jeremy Thompson Sep 04 '12 at 01:51
  • DbDataAdapter is a .net class. Microsoft provides sources? – Alan Baljeu Sep 04 '12 at 13:55
  • 1
    Yeah, you can even [step into the .Net BCL source code](http://referencesource.microsoft.com/netframework.aspx) when debugging. I find the RedGate Reflector tool for source stepping is much easier to set up works. But you can use Reflector on its own to decompile .Net framework DLLs. Then use [Denis Bauers Reflector File Disassembler](http://www.denisbauer.com/NETTools/FileDisassembler.aspx) to output the decompiled source code to projects. – Jeremy Thompson Sep 05 '12 at 00:30
  • I downloaded the Net4 sources from that link. Thanks for that tip. But how can I implement Update() "Calls the respective INSERT, UPDATE, or DELETE statements for each inserted, updated, or deleted row in the specified DataSet." given the XmlDocument doesn't track such elements? – Alan Baljeu Sep 05 '12 at 14:26
0

To load any XML data into DataSet, you have to provide corresponding schema.
See Deriving DataSet Relational Structure from XML Schema (XSD).

Besides, DataSet/DataTable doesn't work with XML documents. They can import data from, and export data to XML.

Dennis
  • 37,026
  • 10
  • 82
  • 150
  • Sorry, this isn't relational data. – Alan Baljeu Sep 04 '12 at 12:24
  • @AlanBaljeu: XML itself isn't relational data by definition. It will be better, if you will give a data sample. Otherwise it is hard to show you the right way. I don't know any FCL classes, which can work with XML and can track changes, made to the XML document. – Dennis Sep 04 '12 at 12:39
  • I'm looking at 3D CAD data. So a collection of objects each with different properties, arranged into hierarchical groups and the groups also have various properties. Occasionally two objects will have the same fields, but most objects are unique. – Alan Baljeu Sep 04 '12 at 13:09
  • This answer says to use a DataSet, but I'm not looking to use a DataSet. – Alan Baljeu Sep 05 '12 at 15:37
0

I haven't found any useable answers anywhere. It seems back in 2003 MS was talking about creating an XPathDocument2 or something that implemented what I'm asking for (books talking about the coming release mention it), but it doesn't seem to have been carried out. So here's my attempt at a solution:

Use XPathDocument/XPathNavigator, and add event handlers for Change/Delete/Insert. For each of these events, put a record in a DataTable {XPath | OldValue | NewValue} indicating the change. When ready to Commit, send the table across then clear it. If instead cancelling, use the Table info to undo the changes in the XPathDocument.

I haven't implemented this yet, but it seems like it might serve.

Alan Baljeu
  • 2,383
  • 4
  • 25
  • 40
0

I have tried to find a free or open-source XML diff tool numerous times before, but never dug up anything that really fit the bill. Essentially, you're looking at tree diffing, which is a whole discpline on its own. The fact that you're using XML is subordinate to this, I guess, as it's nothing but a tree in another form. You "just" need to define what specifies a node.

Though the Decomposition Algorithm for Tree Edit Distance calculates the distance between 2 trees, I suspect you can transform it to give you all changes, as that's the base for the distance measurement. How you communicate the changes after detection, is completely up to you. That could range from XML to JSON. Note that the authors of the algorithm mention they created a Python version in a few dozens of lines, so maybe if you drop the a line, they can be of assistance.

It looks like you could be the first one to publish a practical proof of concept if you can get this done :)

Community
  • 1
  • 1
Grimace of Despair
  • 3,436
  • 26
  • 38
0

The problem you have here is that XML is just a form of representing data, its not necessarily the data itself. Is this some sort of XML editor you are using, or is XML just the transport?

If you are talking about xml as a transport then when you talk about sending XML changes descriptions, you probably want to be generating those change descriptions at the point you generate the change itself, and there is every chance that the change descriptions won't be in the same schema that the original data is.

In addition the reason that datasets can do this, is because each row in a dataset has a known unique key. So the change can be sent back for the row instead of the entire set. XML doesn't work like that, each row doesn't have a unique key. XPath can be used as the change locator but that could be more inefficient than sending the entire document with enough edits.

Why not simply treat the XML as text as use anyone of the standard patching algorithms? (look at the source for Git or Hg)

AlSki
  • 6,868
  • 1
  • 26
  • 39