1

everyone! I'm new to C++, alas I make silly mistakes. This is a snippet of a .txt-file's content:

<tag attr1="value1" attr2="value2" ... >

What I'm trying to accomplish is parsing through the .txt-file, generating the following output:

Tag: tag
name: attr1
value: value1
name: attr2
value: value2

What I've done so far didn't work (my problem is the delimiters):

#include<iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>

using namespace std;

struct tagline{
string tag;
string attributeN;
string attributeV;

};

int main(){
vector<tagline> information;
string line;
tagline t;

ifstream readFile("file.txt");
    while(getline(readFile,line)){
    stringstream in(line);
    getline(in,t.tag);
    getline(in,t.attributeN,'=');
    getline(in,t.attributeV,'"');
    information.push_back(t);

}

vector<tagline>::iterator it = information.begin();

for(; it != information.end(); it++){
cout << "Tag: " << (*it).tag << " \n"
     << "name: " << (*it).attributeN << " \n"
     << "value: " << (*it).attributeV << " \n";

}
return 0;

}

All I get is a plain display of the snippet as it's formatted in the .txt-file:

<tag attr1="value1" attr2="value2" ... >

I would be happy if someone could help. Thank you!

mariechen
  • 23
  • 4
  • It is because you're getlining multiple times on a line. You might want to getline into buffer, and then, depending on the line index, assign it to member. Better solution would be overloading `operator>>`. – Incomputable Dec 01 '17 at 20:29
  • What I don't really understand is how to do the buffer-method with multiple delimiters. Would you mind posting a code-example? If it's not too much trouble. :) – mariechen Dec 01 '17 at 20:39
  • 1
    is using an xml parser library an option? – Stephan Lechner Dec 01 '17 at 20:40
  • It seems like I misunderstood problem statement. I would imbue a new `cctype` in this case. Do values contain whitespaces? If not, this is a piece of cake to solve with `cctype`. – Incomputable Dec 01 '17 at 20:40
  • @Stephan Lechner I haven't really worked with parser libraries, yet (I do know some XML, though), so I wouldn't know how to implement it right away. – mariechen Dec 01 '17 at 20:44
  • @Incomputable No, there are no whitespaces. It would be a piece of cake if it wasn't for the delimiters. ;) I would know how to do it, if there was only one type of delimiter. – mariechen Dec 01 '17 at 20:53
  • @mariechen, if you care about code quality, then, after making it work, you can come to [CodeReview.se](https://codereview.stackexchange.com/), where code gets better. Though, it should be working before posting, unlike SO. – Incomputable Dec 01 '17 at 20:55

3 Answers3

3

This would be better handled using an HTML/XML parser (depending on what your file actually contains).

That being said, you are not parsing the lines correctly.

Your first call to getline(in,t.tag); is not specifying a delimiter, so it reads the entire line, not just the first word. You would have to use getline(in, t.tag, ' '); instead.

Also, your tags can have multiple attributes, but you are only reading and storing the first attribute, ignoring the rest. You need a loop to read all of them, and a std::vector to store them all into.

Try something more like this instead:

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>

using namespace std;

struct tagattribute {
    string name;
    string value;
};

struct tagline {
    string tag;
    vector<tagattribute> attributes;
};

int main() {
    vector<tagline> information;
    string line;

    ifstream readFile("file.txt");
    while (getline(readFile, line)) {
        istringstream in(line);

        tagline t;
        tagattribute attr;

        in >> ws;

        char ch = in.get();
        if (ch != '<')
            continue;

        if (!(in >> t.tag))
            continue;

        do
        {
            in >> ws;

            ch = in.peek();
            if (ch == '>')
                break;

            if (getline(in, attr.name, '=') &&
                in.ignore() &&
                getline(in, attr.value, '"'))
            {
                t.attributes.push_back(attr);
            }
            else
                break;
        }
        while (true);

        information.push_back(t);
    }

    vector<tagline>::iterator it = information.begin();
    for(; it != information.end(); ++it) {
        cout << "Tag: " << it->tag << "\n";

        vector<tagattribute>::iterator it2 = it->attributes.begin();
        for(; it2 != it->attributes.end(); ++it2) {
            cout << "name: " << it2->name << "\n"
            << "value: " << it2->value << "\n";
        }

        cout << "\n";
    }

    return 0;
}

Live demo

Alternatively, consider writing some custom operator>> to help with the parsing, eg:

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>

using namespace std;

struct tagattribute {
    string name;
    string value;
};

istream& operator>>(istream &in, tagattribute &attr)
{
    getline(in, attr.name, '=');
    in.ignore();
    getline(in, attr.value, '"');
    return in;
}

struct tagline {
    string tag;
    vector<tagattribute> attributes;
};

istream& operator>>(istream &in, tagline &t)
{
    tagattribute attr;

    in >> ws;

    char ch = in.get();
    if (ch != '<')
    {
        in.setstate(ios_base::failbit);
        return in;
    }

    if (!(in >> t.tag))
        return in;

    do
    {
        in >> ws;

        ch = in.peek();
        if (ch == '>')
        {
            in.ignore();
            break;
        }

        if (!(in >> attr))
            break;

        t.attributes.push_back(attr);
    }
    while (true);

    return in;
}

int main() {
    vector<tagline> information;
    string line;

    ifstream readFile("file.txt");
    while (getline(readFile, line)) {
        istringstream in(line);
        tagline t;     

        if (in >> t)
            information.push_back(t);
    }

    vector<tagline>::iterator it = information.begin();
    for(; it != information.end(); ++it) {
        cout << "Tag: " << it->tag << "\n";

        vector<tagattribute>::iterator it2 = it->attributes.begin();
        for(; it2 != it->attributes.end(); ++it2) {
            cout << "name: " << it2->name << "\n"
            << "value: " << it2->value << "\n";
        }

        cout << "\n";
    }

    return 0;
}

Live demo

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Oooh, thank you so much! :) I tried it the way you suggested it, using "getline(in, t.tag, ' ');", but I got an error on that, although I think it was also related to an error somewhere else in the code. Thanks so much for your effort! :) – mariechen Dec 01 '17 at 21:31
1

Well, I would try to do something like this using this wonderful answer:

struct xml_skipper : std::ctype<char> {
    xml_skipper() : ctype(make_table()) { }
private:
    static mask* make_table() {
        const mask* classic = classic_table();
        static std::vector<mask> v(classic, classic + table_size);
        v[','] |= space;
        v['"'] |= space;
        v['='] |= space;
        v['<'] |= space;
        v['>'] |= space;
        return &v[0];
    }
};

Then, what you can do is just keep reading:

ifstream readFile("file.txt");
while(getline(readFile,line)){
    istringstream in(line);
    in.imbue(std::locale(in.getloc(), new xml_skipper));
    in >> t.tag >> t.attributeN >> t.attributeV;
    information.push_back(t);
}
//...

Do note that this will break if values or attribute names have whitespaces.


If you want something more serious, you will need to write lexer, syntax tree builder and semantics tree builder.


Full code

#include<iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>
#include <sstream>

using namespace std;

struct tagline{
    string tag;
    string attributeN;
    string attributeV;
};

struct xml_skipper : std::ctype<char> {
    xml_skipper() : ctype(make_table()) { }
private:
    static mask* make_table() {
        const mask* classic = classic_table();
        static std::vector<mask> v(classic, classic + table_size);
        v[','] |= space;
        v['"'] |= space;
        v['='] |= space;
        v['<'] |= space;
        v['>'] |= space;
        return &v[0];
    }
};

int main(){
    vector<tagline> information;
    string line;
    tagline t;
    std::istringstream readFile{"<tag attr1=\"value1\" attr2=\"value2\" ... >"};
    while(getline(readFile,line)){
        istringstream in(line);
        in.imbue(std::locale(in.getloc(), new xml_skipper));
        in >> t.tag >> t.attributeN >> t.attributeV;
        information.push_back(t);
    }


    vector<tagline>::iterator it = information.begin();

    for(; it != information.end(); it++){
        cout << "Tag: " << (*it).tag << " \n"
             << "name: " << (*it).attributeN << " \n"
             << "value: " << (*it).attributeV << " \n";
    }
}

Live on Wandbox.

Incomputable
  • 2,188
  • 1
  • 20
  • 40
  • You're a magician! :D It works! Sorry for taking so long, I was implementing your code snippets your posted above the full code, while looking up the method you used and got stuck on YouTube. Thank you sooo much! Thanks for teaching me about the cctype-method! – mariechen Dec 01 '17 at 21:24
  • @mariechen, you're welcome. And I confused `ctype` with C's `cctype` header, so this approach is basically new locale approach (I'm not really sure how this is called). Also, the other answer has some important points. Do you need only the first attribute? – Incomputable Dec 01 '17 at 21:26
  • I need to display the others as well. But I'll figure that out on my own. You've helped me so much already, thank you! :) – mariechen Dec 01 '17 at 21:32
0

If your input may vary within the boundaries of the xml specification, an XML-parser might be a better approach than parsing the string "manually". Just to show how this could look like, see the following code. It is based on tinyxml2, which just requires to include a single .cpp / .h-file in your project. You could - of course - also use any other xml libraries; this is just for demonstration purpose:

#include <iostream>
#include "tinyxml2.h"
using namespace tinyxml2;

int main()
{
    const char* test = "<tag attr1='value1' attr2 = \"value2\"/>";
    XMLDocument doc;
    doc.Parse(test);
    XMLElement *root = doc.RootElement();
    if (root) {
        cout << "Tag: " << root->Name() << endl;
        const XMLAttribute *attrib = root->FirstAttribute();
        while (attrib) {
            cout << "name: " << attrib->Name() << endl;
            cout << "value : " << attrib->Value() << endl;
            attrib = attrib->Next();
        }
    }
}
Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
  • Thank you for the advice! :) So it's basically like using jQuery in JavaScript in order to reduce the amount of code and make things more simple, I guess? That's so cool! Like I said, I'm a newbie to C++, so I haven't really made it this far, yet. But it's always helpful to learn something new. Thank you very much! :D – mariechen Dec 01 '17 at 22:22