0

Update and fixed: I have fixed the problem causing the error message- Huge thanks to user PaulMcKenzie for helping me understand what the error message was telling me!- When my program encountered a letter with a mark above it (diacritical marks I think they are called), it crashed. I have adjusted my code to account for these and now it doesn't crash at all! Another huge thanks to user ihavenoidea for helping me understand multisets! My program is now working the way it's supposed to!

Original post: ****I am VERY new to C++ so any and all help is appreciated!****

Ok, so I'm trying to use multiset to sort words so I can see how many times a word appears in a text. First, my program accepts a file, then it reads the words and takes out any punctuation, then it puts it into a multiset. After this, it is supposed to put the results into a text file the user names themselves.

My first issue is that the multiset seems to be creating more than one element for the same word (For example: in one of my tests I saw a(4) listed in the text document 3 times in a row instead of one time).

My Second issue is that when I try to read in large text documents (I'm using John Colliers story "Bottle Party" http://ciscohouston.com/docs/docs/greats/bottle_party.html to test it) my program completely crashes but doesn't crash when I test it with a smaller text document (small being with say about 5-10 lines of text). I'm using Visual Studios and (once again I'm new to Visual Studios also) I don't know what the error message is trying to tell me but it says: error message

After selecting retry: error2 As always, any and all help is greatly appreciated.

Code here:

#include <iostream>
#include <string> //for strings
#include <fstream> //for files
#include <set> //for use of multiset

using namespace std;

string cleanUpPunc(string);

//Global variables
multiset <string> words; //will change back to local variable later

int main() {
//Starting variables
string fileName1 = "", fileName2 = "", input = "", input2 = ""; //To hold the input file and the file we wish to print data to if desired
ifstream fileStream; //gets infor from file

//Program start
cout << "Welcome to Bags Program by Rachel Woods!" << endl;
cout << "Please enter the name of the file you wish to input data from: ";
getline(cin, fileName1);

//Trys to open file
try {
    fileStream.open(fileName1);
    if (!fileStream) {
        cerr << "Unable to open file, please check file name and try again." << endl;
        system("PAUSE");
        exit(1);
    }

    while (fileStream >> input) {
        input2 = cleanUpPunc(input); //sends the input word to check for punctation
        words.insert(input2); //puts the 'cleaned up' word into the multiset for counting
    }
    fileStream.close();

        //Sends it to a text document
        cout << "Please name the file you would like to put the results into: ";
        getline(cin, fileName2);

        ofstream toFile; //writes info to a file

                         //Code to put info into text file
        toFile.open(fileName2);
        if (toFile.is_open()) {

            multiset<string>::iterator pos;
            for (pos = words.begin(); pos != words.end(); pos++) {
                toFile << *pos << " " << words.count(*pos) << endl;
            }
            toFile.close();
            cout << "Results written to file!" << endl;

        }
        else {
            cout << "Could not create file, please try again." << endl;
        }

    }catch (exception e) {
    cout << "Stop that. ";
    cout << e.what();
}

cout << "Thanks for using this program!" << endl;
system("PAUSE");
return 0;
}

string cleanUpPunc(string maybe) {
//Takes out puncuation from string
//Variables
string takeOut = maybe;

//Method

for (int i = 0, len = maybe.size(); i < len; i++) {
    if (ispunct(takeOut[i])) {
        takeOut.erase(i--, 1);
        len = takeOut.size();
    }
}

return takeOut;
}
RJWoods
  • 41
  • 1
  • 5
  • You have a debugger, so please use that to tell us more information about what is going wrong, including the variables at that point. You may also find this like useful: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ . Finally, your bug has nothing to do with multisets. – Ken Y-N Nov 12 '18 at 02:13
  • 1
    Multisets allow duplicate elements, that's why you have multiple copies of strings. – ihavenoidea Nov 12 '18 at 02:18
  • 1
    I would suggest to you the use of a dictionary. (`std::map` in C++). Check this answer that address exactly your problem: https://stackoverflow.com/questions/16867944/counting-occurrences-of-each-word-in-a-text-file – ihavenoidea Nov 12 '18 at 02:30
  • ihavenoidea- I thought that multisets could add a count to an element to tell if it's been used more than once but not create more than one element for it and that sets could only tell you if it appeared but not give you a count of how many times it appeared? – RJWoods Nov 12 '18 at 02:30
  • 1
    In fact, multisets and sets (mathematically speaking) have nothing to do about the number of times an element is present. It's just that one allow duplicates and the other does not (among other things like how each one do certain set operations). – ihavenoidea Nov 12 '18 at 02:35
  • As to your crashing problem, I don't have Visual Studio and therefore I can't reproduce the problem. The code run fine here with the big text you provided. – ihavenoidea Nov 12 '18 at 02:37
  • Ken Y-N- Thank you for the link! However, I have done some of the things mentioned in it such as showing what errors Visual Studio has given me and as I mentioned in my question I'm not sure what they are trying to tell me. Before posting my question I went through my code more than a few times trying to figure out what exactly could cause it to have issues but alas was unable to spot what should probably be obvious to me thus why I am here asking for help. – RJWoods Nov 12 '18 at 02:38
  • 1
    @RJWoods That crash indicates you probably have a non-ASCII character being processed, and you're feeding that character into a function that expects ASCII characters. It has nothing to do with multiset. Also, your `takeOutPunct` is way too much work. To do that in C++ is a one-line call to `std::erase_if`. – PaulMcKenzie Nov 12 '18 at 02:43
  • @RJWoods If you copied and pasted the text from that link into another file, beware that you could have copied over "fancy" characters that are not ASCII. Thinks like the "fancy quotes", the "double hyphen", or some other character. I know this happens a lot if you copied from, say an email program or a word processing document as text, where the text may not all be ASCII. Go through the file character by character, and ensure all the characters are within the range 0 - 255 (ASCII is actually up to 127, not 255, but the function is asserting on > 255). – PaulMcKenzie Nov 12 '18 at 02:49
  • Rewrite your `cleanUpPunc()` function. That loop looks awfully dangerous. Just use `string cleanUpPunc(string s) { s.erase(std::remove_if(s.begin(), s.end(), ispunct), s.end()); return s;}`. This removes all the punctuation without all of that stuff going on in your loop. I wouldn't be surprised if your error just goes away also. – PaulMcKenzie Nov 12 '18 at 03:21

0 Answers0