5

I have an XML document on input which is awfully formatted (it's Delphi project file if anyone cares) - inconsistent indenting, empty lines, strings of nodes lumped together:

<BorlandProject><Delphi.Personality><Parameters><Parameters Name="HostApplication">C:\Some\Path\Filename.exe</Parameters> <!--etc--> <Excluded_Packages>


</Excluded_Packages>

I want to reformat it into something nice. What's the easiest way to do that programmatically, with Win32/COM? If MSXML, how do I go about it?

I'd like to be able to specify indentation unit too (tab/several spaces).

I tried using Delphi's MSXML wrapper TXmlDocument and it does indeed delete the empty lines and indent nodes with tabs, but it does not split lines like this one:

<BorlandProject><Delphi.Personality><Parameters><Parameters Name="HostApplication">C:\Some\Path\Filename.exe</Parameters> <!--etc--> <Excluded_Packages>
himself
  • 4,806
  • 2
  • 27
  • 43
  • Seems to be a command line tool, but I'm new to XML/XSLT so if I'm misunderstanding something please elaborate. – himself Nov 29 '10 at 15:48

2 Answers2

26

I tested the FormatXMLData function in a delphi project file and works ok, indent all the lines correctly.

check this code.

uses
 XMLIntf,
 XMLDoc;

Procedure FormatXMLFile(const XmlFile:string);
var
   oXml : IXMLDocument;
 begin
   oXml := TXMLDocument.Create(nil);
   oXml.LoadFromFile(XmlFile);
   oXml.XML.Text:=xmlDoc.FormatXMLData(oXml.XML.Text);
   oXml.Active := true;
   oXml.SaveToFile(XmlFile);
 end;
dan-gph
  • 16,301
  • 12
  • 61
  • 79
RRUZ
  • 134,889
  • 20
  • 356
  • 483
  • iirc usgae of TXMLDocument variables in some versions (including Delphi 2007) caused memory leaks ... I have not tested it in Delphi 2009 yet but since then I am only using IXMLDocument type variables. Maybe this was the reason to set the reference to nil in this code example instead of using Free? – mjn Nov 29 '10 at 18:35
  • RRUZ, Shouldn't oXml be defined as IXMLDocument, since you are relying on reference counting for it to be freed? Also XMLIntf would need to be added to the uses clause. – Alan Clark Nov 29 '10 at 20:51
  • Is there any reason you are using an IXMLDocument in this case (simply reading, reformatting, and writing)? The overhead of the IXMLDocument is overkill. The FormatXMLData method takes in a string and outputs a string, both of which have ways of reading and writing with less overhead. – Chris J Nov 30 '10 at 20:24
  • 1
    Chris J, i use the `IXMLDocument` to avoid problems with the encoding of the xml file. – RRUZ Nov 30 '10 at 21:34
  • Causes error when XML contains DTD. – user1580348 Apr 06 '23 at 08:57
2

I used Tidy to format XML. RRUZ's method using xmlDoc.FormatXMLData works very well, and it makes sense to use it, but if your XML files happen to be big, then it may not work so well. When I tried to format a 100 MB, single-line XML file, the application crashed with an out-of-memory error on a 4GB machine, and it was very slow as well.

I used the command line version of tidy. There is also a DLL version, and there is a Delphi header file for that that you can hunt down, but I found it more convenient to run the exe via CreateProcess rather than learn the DLL API.

This is the command line I used:

tidy.exe -xml -wrap 0 -indent -quiet -o outFile.xml inFile.xml

tidy.exe is stand-alone, you don't need the DLL or anything else.

Other possibilities for formatting XML are xmllint and xml starlet.

I couldn't get xmllint to run at all, but I'm sure I could have if I had persisted.

xml starlet seemed to work well, but it didn't have any option to write to a file, only to stdout, so I didn't use that because I would have had to work out how to capture the output.

dan-gph
  • 16,301
  • 12
  • 61
  • 79
  • Capturing stdout output is as simple as adding the following to the end of your command "> outfile.xml" – Codure May 01 '20 at 20:41