3

I'm aware there are several XML libaries out there, but unfortunately, I am unable to use them for a school project I am working on.

I have a program that created this XML file.

<theKey>
<theValue>23432</theValue>
</theKey>

What I am trying to do is parse out "23432" between the tags. However, there are random tags in the file so may not always on the second line from the top. Also, I don't know how many digits the number is between the tags.

Here is the code I developed so far. It is basic because I don't know what I can use that is part of the C++ language that will parse the value out. My hint, from me working with JAVA, is to use somethign from the "String" library but so far I am coming up short on what I can use.

Can anyone give me direction or a clue on what I can do/use? Thanks a lot.

Here is the code I developed so far:

#include <iostream>
#include <fstream>
#include <string>

using std::cout;
using std::cin;
using std::endl;
using std::fstream;
using std::string;
using std::ifstream;


int main()
{
 ifstream inFile;
 inFile.open("theXML.xml");

 if (!inFile)
 {
 }

 string x;
 while (inFile >> x)
 {
  cout << x << endl;
 }

 inFile.close();

 system ( "PAUSE" );


 return 0;
}
different
  • 31
  • 1
  • 1
  • 2
  • Just grab Bison and use it generate your own XML parser. – Anon. Feb 08 '10 at 22:58
  • ... or Boost Spirit, if you prefer. – Fred Larson Feb 08 '10 at 23:00
  • @*: He mentioned he can't use publicly available libraries -- homework possibly. – dirkgently Feb 08 '10 at 23:04
  • If you can use a regular expressions library (such as boost.regex or std::tr1::regex) then you might consider doing as this post says: http://immike.net/blog/2007/04/06/5-regular-expressions-every-web-programmer-should-know/ – Manuel Feb 08 '10 at 23:10
  • @dirk - He was talking about "XML libraries", maybe he can use something a bit more general. – Manuel Feb 08 '10 at 23:11
  • Yes, this is a homework assignment. The teacher never said we couldn't use XML libraries but since I am learning C++ and these XML files are short, I rather learn how to do something like this for my own benefit. – different Feb 08 '10 at 23:19
  • The XML files may be short, but that doesn't make XML itself any less complicated. – bobince Feb 08 '10 at 23:24
  • @bobince: When someone wants to learn something, for heaven's sakes, let's help him. For this once, I might actually write his homework. – dirkgently Feb 08 '10 at 23:30
  • 2
    I have figured out a solution based on your ideas. Here is my basic algorithm: - read XML file into a string - user an iterator to iterator through the string - find my tag. record location - prase out value This may not be the best solution but it works – different Feb 08 '10 at 23:49

4 Answers4

7

To parse arbitrary XML, you really need a proper XML parser. When you include all the character-model nooks and DTD-related crannies of the language, it is not at all simple to parse, and it's a terrible faux pas to write a parser that only understands an arbitrary subset of XML.

In the real world, it would be wrong to use anything but a proper XML parser library to implement this. If you can't use a library and you can't change the program's output format to something more easily-parsed (eg. newline-separated key/value pairs), you're in an untenable position. Any school project that requires you to parse XML without an XML parser is totally misguided.

(Well, unless the whole point of the project is to write an XML parser in C++. But that would be a very cruel assignment.)

bobince
  • 528,062
  • 107
  • 651
  • 834
4

Here's an outline of what your code should look like (I've left out the tedious parts as an exercise):

std::string whole_file;

// TODO:  read your whole XML file into "whole_file"

std::size_t found = whole_file.find("<theValue>");

// TODO: ensure that the opening tag was actually found ...

std::string aux = whole_file.substr(found);
found = aux.find(">");

// TODO: ensure that the closing angle bracket was actually found ...

aux = aux.substr(found + 1);

std::size_t end_found = aux.find("</theValue>");

// TODO: ensure that the closing tag was actually found ...

std::string num_as_str = aux.substr(0, end_found); // "23432"

int the_num;

// TODO: convert "num_as_str" to int

This is not a proper XML parser of course, just something quick and dirty that solves your problem.

Manuel
  • 12,749
  • 1
  • 27
  • 35
2

You will need to create functions to at least:

  • If the node is a container node then
    • Identify/parse elements (beginings and ends) and attributes, if any
    • Parse children recursively
  • Otherwise, extract the value while trimming trailing and leading whitespaces, if any, if they are not significant

The std::string provides quite a few useful member functions such as: find, find_first_of, substr etc. Try to use these in your functions.

dirkgently
  • 108,024
  • 16
  • 131
  • 187
2

THe C++ Standard library provides no XML parsing features. If you want to write this on your own, I suggest looking at std::geline() to read your data into strings (don't try to use operator>> for this), and then at the std::string class's basic features like the substr() function to chop it up. But be warned that writing your own XML parser, even a basic one, is very far from trivial.

  • why is it prefered to use std::getline() over << ? – different Feb 08 '10 at 23:06
  • 1
    The stream operator>> is basically intended for reading space delimited numeric values. You can make it work for other values, but it it is particularly bad at reading strings, which may contain spaces. –  Feb 08 '10 at 23:09
  • I have figured out a solution based on your ideas. Here is my basic algorithm: - read XML file into a string - user an iterator to iterator through the string - find my tag. record location - prase out value This may not be the best solution but it works. – different Feb 08 '10 at 23:48