10

I have a very basic XML structure/file on disk which is something like:

<root>
    <data timestamp="dd-mm-yyyy" type="comment">kdkdkdk</data>
    <data timestamp="dd-mm-yyyy" type="event">kdkdkdkgffgfdgf</data>
    <data timestamp="dd-mm-yyyy" type="something">kddsdsfsdkdkdk</data>
</root>

The XML will be, as mentioned, in an external file on disk. As the file might grow fairly large (actually gets 'trimmed' every couple of weeks), I don't want to load the XML file first to add a new node...

Is there a way to add a new node like this? it can be just added to the top/bottom etc, as the process that actually uses the XML sorts it by timestamp anyway..

I'm guessing a crude way is to append the node as text.. however I think that would add the node AFTER the end tag??

Any ideas gratefully received.. David.

Dav.id
  • 2,757
  • 3
  • 45
  • 57

8 Answers8

8

Not with any XML API or tool.

You could open the file as Text, find the Position of </root> and start overwriting from there. And of course add the </root> again.


A quick and dirty approach, this is not very robust:

  • make sure you prep the initial file, there should be nothing (no whitespace) after the closing tag
  • make sure you use ASCII or UTF8 encoding

====

string closeTag = "</root>";
int closeLength = Encoding.UTF8.GetBytes(closeTag).Length;

var fs = System.IO.File.Open(filename, System.IO.FileMode.Open);
fs.Position = fs.Length - closeLength;

var writer = new StreamWriter(fs);  // after the Position is set

writer.WriteLine(newTag);
writer.Write(closeTag);  // NOT WriteLine !!

writer.Close();
fs.Close();

And of course you know about using using() {} blocks.

H H
  • 263,252
  • 30
  • 330
  • 514
  • Yes true.. in fact amcashcow just commented above about that.. so thank you! – Dav.id Jan 23 '11 at 11:41
  • @David: NO, I meant _not_ reading it all but use Stream.Position. – H H Jan 23 '11 at 11:43
  • You can do it with an XML API (XmlReader/XmlWriter). It's not easy to implement (depending on the possible content of the XML file), but doable. If you just have basic tags and attributes like above, it wouldn't even be difficult. – TToni Jan 23 '11 at 11:47
  • @Ttoni: the difficult part is finding the start of ``, and a XmlReader is not suitable for that. @David will need Encoding.GetBytes() and some byte[] comparing. – H H Jan 23 '11 at 12:13
  • @Henk Holterman: If you use an XmlReader, you can only logically find the </root> Tag. That's why I wrote Reader/Writer: You need to copy the input to a new file and insert the new Tag on the way. BTW it's one of my great complaints about XmlReader that it doesn't publish the actual stream position of the content, even though it must have that information. – TToni Jan 23 '11 at 14:07
  • @ttoni: you could only find that closing tag by reading the entire file (from the start), which is basically the same as loading it. – H H Jan 23 '11 at 14:13
  • I think what he meant is similar to http://stackoverflow.com/questions/62423/how-to-update-large-xml-file – vtd-xml-author Jan 23 '11 at 21:40
4

Somehow, you'll have to read the XML file, since you cannot just add a node to the document.

An Xml file can have only one root, so the new node will have to be added after the last data element, but, before the root closing tag. Otherwise the xml is not valid.

Frederik Gheysels
  • 56,135
  • 11
  • 101
  • 154
  • Yes true.. I realise that is probably going to be the case.. but thought why not ask the question anyway! just in case! – Dav.id Jan 23 '11 at 11:37
2

There are a a number of decent answers here already, however just to put another spin on this, are you sure this data needs to be stored as an XML file?

The data you are storing looks fairly simple (i.e a record with three fields, date, type and some string data. If you do not need any of the added benefits that XML provides (see here) then why not just use basic CSV? That way you can just keep appending to it just like a log file via File.AppendText("filename").

(Hell, it might even be possible to use something like log4net to manage the logging and any clean up housekeeping.)

Community
  • 1
  • 1
  • Yes, this could be a sensible way to sidestep the problem. – H H Jan 23 '11 at 12:19
  • Thanks, actually to everyone, some good insights for sure. One of the only reasons for using XML (that, and the fact it has already some plumbing in an existing system) is to be able to load the XML direct into a dataset/table to use with some existing code.. but given the comments from this question, and the latency I have for loading the data later, the CSV option is probably the way.. (although Henk's stream.position option isn't too bad either!) thanks again!! – Dav.id Jan 23 '11 at 21:36
1

Depends.

If you want to do it the "correct" way, you have to read the file and the XML in it. You can avoid loading it completely into memory by using an XmlReader class for example.

However, if you definitely absolutely know the text-layout and the encoding of the file, you can avoid reading and re-writing it completely by opening it as random-access file (FileStream), skip to the end (minus the "<root/>"), add the new entry there and write the "<root/>" again.

TToni
  • 9,145
  • 1
  • 28
  • 42
1

I can't see a way to do what you're up to without reading the full file, but as a workaround maybe you could treat the file as a plain text ie without the root node (it will be invalid XML). Then, you could just append new nodes to the file as plain text. And, since your XML parsing process loads the entire file anyway, it can add the root node before treating it as XML.

veljkoz
  • 8,384
  • 8
  • 55
  • 91
1
System.IO.FileInfo[] aryFi = di.GetFiles("*.xml");


foreach (System.IO.FileInfo fi in aryFi) {

 System.Xml.XmlDocument xmlDocument = new System.Xml.XmlDocument();
 xmlDocument.Load(fi.FullName);

 XmlNode refelem = xmlDocument.LastChild;
 XmlNode newElem = xmlDocument.CreateNode("element", "something", "");
 newElem.InnerText = "sometext";
 xmlDocument.InsertAfter(newElem, refelem);
}

I believe opening and inserting a node would be best option. Either way you would need to use IO, why not do it proper way?

For single file

System.Xml.XmlDocument xmlDocument = new System.Xml.XmlDocument();
 xmlDocument.Load("file path");

 XmlNode refelem = xmlDocument.LastChild;
 XmlNode newElem = xmlDocument.CreateNode("element", "something", "");
 newElem.InnerText = "sometext";
 xmlDocument.InsertAfter(newElem, refelem);
Ismail
  • 634
  • 1
  • 5
  • 11
  • That `*.xml` seems to come from nowhere. And this still loads the entire file (and some more), the question is about avoiding that. – H H Jan 23 '11 at 12:18
  • Yes that was for if you wish to perform same action on all xml files in a directory. There is code before that which gets directory info. In your case you won't be looping through all the files. I am going to add code for single file. – Ismail Jan 23 '11 at 12:20
  • If I was in this situation and since both approaches require you to perform IO call then preferred for me would be adding XML node instead of text, you might have some reasons to plan to do it this way. – Ismail Jan 23 '11 at 12:28
1

If by not loading XML, you mean not building a DOM tree, VTD-XML is the only API that allows you to cut paste split or modify incrementally. Furthermore, because VTD-XML is memory efficent, you won't have to worry about the size of XML document. A similar post in Java is at How to update large XML file. Notice that vtd-xml is available in Java, C# , C and C++.

Community
  • 1
  • 1
vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30
  • OK, I suppose your user-name counts as disclosure. – H H Jan 23 '11 at 21:50
  • Had not heard of this.. in fact probably overkill for this solution perhaps, but definitely will check it out.. – Dav.id Jan 25 '11 at 04:08
  • @Henk: I disagree, and have been spam-flagging him (except for one answer that I upvoted). – John Saunders Feb 01 '11 at 00:46
  • 2
    @John Saunders @Henk Holterman: It may or may not count as disclosure, but it sure as heck makes it look all the more like he's here *just* to promote his product. – BoltClock Feb 01 '11 at 01:02
  • 1
    @BoltClock: I just don't think it's so hard to disclose that I excuse him for not disclosing. This has been a perennial problem with him. – John Saunders Feb 01 '11 at 01:03
-3

just use

string s = File.ReadAllText(path)
s = s.replace("</root>", newnode + "</root>")
File.WriteAllText(s, path)
amcashcow
  • 724
  • 1
  • 6
  • 16
  • That still loads it... It just skips the XML parsing, a minor saving. The I/O is expensive. – H H Jan 23 '11 at 11:38
  • Cool.. nice simple way to add the new node.. I guess it still means reading the xml file, but then I guessed it would be the case, so thanks for that, nice clean example! – Dav.id Jan 23 '11 at 11:40
  • -1. The question is about opening large files. I'm not sure if loading a large file in memory is a best approach, when there are streams for that. – Arseni Mourzenko Jan 23 '11 at 11:44
  • i guess then you'd need to know the physical address on the disk and somehow write directly to it. if this was a table in a database i think that might avoid a read – amcashcow Jan 23 '11 at 11:47
  • -1 because (a) the OP explicitly asked for a solution that does *not* load the entire XML document, and because (b) your code will likely result in an invalid XML document (namely one with two document root nodes). – stakx - no longer contributing Jan 23 '11 at 11:52