1

I have a file A that has multiple paragraphs. I need to identify where I matched words from another file B. I need to tell the paragraph, line number, and word number of every word, including those matching a word in file B. I've finally gotten so far, having given up on vectors, and arrays, and string splitting. I learned (I think) stringstream. Currently, I read in the line, then split it on the "." into sentences, then read those sentences back in again, splitting on the " ". I have the line numbers counting, and the words counting and matching, but I just can't seem to get the paragraph numbers (I've realized that the p++ is actually counting the lines, and the l++ is counting words as well). Could someone please help me? edit Each paragraph is separated by "\n" and each sentence is separated by a "." I'll still have to figure out a way to ignore all other punctuation so that words match 100%, and are not thrown off by a comma, semi-colon, or other punctuation. I'm guessing that will be a regex in there somewhere.

input from file with the text would look like:

    My dog has fleas in his weak knees. This is a line.  The paragraph is ending.'\n'
    Fleas is a word to be matched.  here is another line.  The paragraph is ending.'\n'

output should look something like:

    paragraph1 line 1 word 1  My
    paragraph1 line 1 word 2  dog
    paragraph1 line 1 word 3  has
    paragraph1 line 1 word 4  MATCHED!  fleas
while (getline(fin, para)) { //get the paragraphs
    pbuffer.clear();
    pbuffer.str("."); //split on periods
    pbuffer << para;
    p++; //increase paragraph number

    while (pbuffer >> line) { //feed back into a new buffer

        lbuffer.clear();
        lbuffer.str(" "); //splitting on spaces
        lbuffer << line;
        l++; //line counter

        while (lbuffer >> word) { //feed back in
            cout << "l " << l << "   W:  " << w << "   " << word;
            fmatch.open("match.txt");
            while (fmatch >> strmatch) {  //did I find a match?
                if (strmatch.compare(word) == 0) {
                    cout << "  Matched!\n";
                }
                else {
                    cout << "\n";
                }

            }
silly_girl
  • 25
  • 5
  • Can you put your code back to how it was? Also, can you provide input and expected output, and then provide the output you are seeing? – AndyG Nov 15 '16 at 12:34
  • If paragraphs are separated by a newline and so are lines, then what's the difference between a paragraph number and a line number? – AndyG Nov 15 '16 at 12:40
  • A line ends with a "." and a paragraph ends with a \n. I thought that by bringing in the line, I could count the paragraphs, but quickly realized that by splitting the line with the "." delimiter during the pbuffer process, what I was actually getting was the line numbers based upon the delimiter. I feel like I'm missing something stupidly simple, but just can't figure it out. @AndyG – silly_girl Nov 15 '16 at 12:53
  • Line number is not going to work for you: http://stackoverflow.com/a/40548142/2642059 but it looks like you're just looking for word, sentence, and paragraph, is that correct? – Jonathan Mee Nov 15 '16 at 13:07
  • yes, I need the word, sentence,a nd paragraph number of the match. I've got it getting line and word, but not paragraph. – silly_girl Nov 15 '16 at 13:09
  • So you're trying to put the words into a container right? Or do you just want their indexes and then you're going to throw them away? – Jonathan Mee Nov 15 '16 at 13:12
  • I need to keep the words long enough to write them to a CSV file, instead of cout on the screen. @JonathanMee – silly_girl Nov 15 '16 at 13:14
  • Could you write them on read? Or do you need the complete collection before you begin writing? (I'm writing you an answer right now which holds them in a container... Just want to make sure I'm giving you something you want.) – Jonathan Mee Nov 15 '16 at 13:17
  • I could write them on read, if they matched. I do not need a complete collection. 've edited the post to include sample input and the corresponding output. – silly_girl Nov 15 '16 at 13:21
  • "A line ends with a "."". That's one unorthodox definition of "line". Are you sure your assignment says that? "then split it on the "." into sentences". This seems to imply "line" and "sentence" are interchangeable in the context of your assignment. I would double-check that. (I assumed it's an assignment). – n. m. could be an AI Nov 15 '16 at 13:42
  • it's not an assignment, as it's for work. I really mean sentence when I say line. – silly_girl Nov 15 '16 at 13:44

2 Answers2

1

Since you say that you can write each word on read, we won't bother with a collection. We'll just use istringstream and istream_iterator and counter the indices.
Assuming that fin is good, I'm going to simply write to cout you can make the appropriate adjustments to write to your file.

1st you'll need to read in your "fmatch.txt" into a vector<string> like so:

const vector<string> strmatch{ istream_iterator<string>(fmatch), istream_iterator<string> }

Then you'll just wanna use that in a nested loop:

string paragraph;
string sentence;

for(auto p = 1; getline(fin, paragraph, '\n'); ++p) {
    istringstream sentences{ paragraph };

    for(auto s = 1; getline(sentences, sentence, '.'); ++s) {
        istringstream words{ sentence };

        for_each(istream_iterator<string>(words), istream_iterator<string>(), [&, i = 1](const auto& word) mutable { cout << 'w' << i++ << ", p" << p << ", s" << s << (find(cbegin(strmatch), cend(strmatch), word) == cend(strmatch) ? ", word, " : ", namedEntity, ") << word << endl; });
    }
}

Live Example

EDIT:

By way of explaination, I'm using a for_each to call a lambda on each word in the sentence.

Let's break apart the lambda and explain what each section does:

  • [& This exposes, by reference, any variable in the scope in which the lambda was declared to the lambda for use: http://en.cppreference.com/w/cpp/language/lambda#Lambda_capture Because I'm using strmatch, p, and s in the lamda those will be captured by reference
  • , i = 1] C++14 allowed us to declare a variable in the lambda capture of type auto so i is an int which will be reinitialized each time the scope in which the lambda is declared is rentered, here that's each entry into the body of the nested for-loop
  • (const auto& word) This is the parameter list passed into the lambda: http://en.cppreference.com/w/cpp/language/lambda Here for_each will just be passing in strings
  • mutable Because I'm modifying i, which is a owned by the lambda, I need it to be non-const so I declare the lambda mutable

In the lambda's body I'll just use find with standard insertion operators to write the values.

EDIT2:

If you're limited to C++11, you won't be able to declare a variable in the lambda capture. You can just provide that externally:

string paragraph;
string sentence;

for(auto p = 1; getline(fin, paragraph, '\n'); ++p) {
    istringstream sentences{ paragraph };

    for(auto s = 1; getline(sentences, sentence, '.'); ++s) {
        istringstream words{ sentence };
        auto i = 1;

        for_each(istream_iterator<string>(words), istream_iterator<string>(), [&](const auto& word){ cout << 'w' << i++ << ", p" << p << ", s" << s << (find(cbegin(strmatch), cend(strmatch), word) == cend(strmatch) ? ", word, " : ", namedEntity, ") << word << endl; });
    }
}
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • While that solution is elegant, I'm really not getting istream_iterator. Why wouldn't I just push_back as getline from fmatch into the vector? I'm receiving a strmatch redefinition error, as I'm in Visual Studio. I think there's a different way to write iterator for Visual Studio, so I'm off to go do more research @JonathanMee Thanks for pointing me to the right direction! – silly_girl Nov 15 '16 at 14:44
  • @Melissagoodall I've edited to clarify. If you need further clarification on a specific point I can elaborate further. I'd like to work with you to get this answer to the point where you can accept it as a solution to your problem. Could you post the specific error you're seeing, perhaps as an edit to your answer if it's huge? It sounds as though you've declared `strmatch` and then in the same scope have pasted in my code which declares and initializes `strmatch` inline. Search your code for `strmatch` and make sure it's only declared at one point. – Jonathan Mee Nov 15 '16 at 15:00
  • Please note, that this coding is beyond me :). My floor may soon be filling up with strands of long brown hair. Using your example, exactly as you typed it in the "Live" tester, VS2013 hates the for each. Errors start with "missing ] before =" and "i lamda capture variable not found" and the list goes on. I hate this project now. – silly_girl Nov 15 '16 at 16:18
  • @Melissagoodall [Visual Studio 2015](https://msdn.microsoft.com/en-us/library/dd293608(v=vs.140).aspx) is the first to mention C++14's lambda syntax, I don't see any mention of it in the [Visual Studio 2013](https://msdn.microsoft.com/en-us/library/dd293608(v=vs.120).aspx) documentation. Thus I suspect that you'll need to declare `i` outside the lambda capture. I've updated my answer. If this doesn't solve it, follow up with me again and we'll keep at it. – Jonathan Mee Nov 15 '16 at 17:48
  • 1
    I posted my much less elegant solution below. Thank you for helping me work through it :) I"ll study up on string interator and lamda syntax for sure! – silly_girl Nov 15 '16 at 18:41
  • @Melissagoodall I'm thrilled you got it working. It's incredibly frustrating to be not completely sure what's going on in code and to have stuff failing. Good job sticking with it. While my code is shorter than yours, the most important thing is that you understand what's happening. If in the future you want to take another pass at trying to understand this don't hesitate to shoot me a comment, and we'll try again! – Jonathan Mee Nov 15 '16 at 18:45
1

I did finally figure it out, but I didn't use the stream interator (sorry!) And it's certainly not as elegant @jonathanMee

I vectored the matching words and used string stream to read in the characters nesting it as I went. I then used an if statment to check for paragraphs, and delimited as I poured the data from one string to another using string stream. I incremented when I delimited the data, and the did the match. EXAMPLE:

            pholder.clear();
            pholder.str("."); //break on the delimiter
            pholder << para; //read from the paragraph into pholder
            l++;

            while (pholder >> line) {// here are all my lines now

                lholder.clear();
                lholder.str(" "); //breka on the spaces
                lholder << line; //read for it
silly_girl
  • 25
  • 5