1

How could one parse a xml-like string and convert it a separated list?

I am trying to convert the following string:

<Categories>
  <Category Assigned="0">
    6 Level
    <Category Assigned="1">
      6.2 Level
      <Category Assigned="0">
        6.3 Level
        <Category Assigned="0">
          6.4 Level
          <Category Assigned="1">
            6.5 Level
          </Category>
        </Category>
      </Category>
    </Category>
  </Category>
</Categories>

To a separated list like:

6 Level/6.2 Level/6.3 Level/6.4 Level/6.5 Level, 6 Level/6.2 Level

Robin Mills of exiv2 provided a perl script: http://dev.exiv2.org/boards/3/topics/1912?r=1923#message-1923

That would need to also parse Assigned="1". How can this be done in C++ to use in digikam, inside dmetadata.cpp with a structure like:

    QStringList ntp = tagsPath.replaceInStrings("<Category Assigned="0">", "/");

I don't have enough programming background to figure this out, and haven't found any code sample online that do something similar. I'd also like to include the code in exiv2 itself, so that other applications can benefit.

Working code will be included in digikam: https://bugs.kde.org/show_bug.cgi?id=345220

asp
  • 139
  • 6

2 Answers2

0

The code you have linked makes use of Perl's XML::Parser::Expat module, which is a glue layer on top of James Clark's Expat XML parser.

If you want to follow the same route you should write C++ that uses the same library, but it can be clumsy to use as the API is via callbacks that you specify to be called when certain events in the incoming XML stream occur. You can see them in the Perl code, commented process an start-of-element event etc.

Once you have linked to the library, it should be simple to write C code that is equivalent to the Perl in the callbacks — they are only a single line each. Please open a new question if you are having problems with understanding the Perl

Note also that Expat is a non-validating parser, which will let through malformed data without comment

Given that the biggest task is to parse the XML data in the first place, you may prefer a different solution that allows you to build an in-memory document structure from the XML data, and interrogate it using the Document Object Model (DOM). The libxml library allows you to do that, and has its own Perl glue layer in the XML::LibXML module

Community
  • 1
  • 1
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • I'm going to change the question to leave out the perl conversion then, as it is not the important part. – asp Apr 11 '15 at 20:38
  • @asp: Hmm okay. But what *is* the important part? I think it's fine to link to the Perl code to show the algorithm for the transformation that you need. But if you're aiming to describe that algorithm without reference to the Perl implementation then that may be a backwards step, especially if you're not familiar with Perl – Borodin Apr 11 '15 at 20:49
  • The important part is the string conversion, not the perl example. – asp Apr 13 '15 at 01:55
  • I think the solution remains the same: pick an XML parser library and write some code that uses it to parse the XML and extract the text nodes – Borodin Apr 13 '15 at 13:34
  • Thanks @Borodin. I am looking for examples that do this type of thing. Gilles over at digikam says that it should be possible using QstringList ... – asp Apr 13 '15 at 17:12
  • @asp: I strongly encourage you to use a proper XML parser. QstringList isn't one of those. If you download [the `libxml` library](http://www.xmlsoft.org/) and start to use it I think you will find it more simple than you imagined. It is very comprehensive, and so the docs are extensive, but the individual concepts are straightforward – Borodin Apr 13 '15 at 17:25
0

Maik Qualmann has provided a working patch for digikam!

QString xmlACDSee = getXmpTagString("Xmp.acdsee.categories", false);
if (!xmlACDSee.isEmpty())
{
    xmlACDSee.remove("</Categories>");
    xmlACDSee.remove("<Categories>");
    xmlACDSee.replace("/", "|");

    QStringList tagsXml = xmlACDSee.split("<Category Assigned");
    int category        = 0;
    int length;
    int count;

    foreach(const QString& tags, tagsXml)
    {
        if (!tags.isEmpty())
        {
            count  = tags.count("<|Category>");
            length = tags.length() - (11 * count) - 5;

            if (category == 0)
            {
                tagsPath << tags.mid(5, length);
            }
            else
            {
                tagsPath.last().append(QString("/") + tags.mid(5, length));
            }

            category = category - count + 1;

            if (tags.left(5) == QString("=\"1\">") && category > 0)
            {
                tagsPath << tagsPath.value(tagsPath.size() - count - 1);
            }
        }
    }

    if (!tagsPath.isEmpty())
    {
        return true;
    }
}
asp
  • 139
  • 6