1

Hello I am new to stackoverflow so please pardon any newbie mistakes I make. I have a program I am trying to build in c++ and I am running into some problems. This program is supposed to let the user input a file name and then read the file and count the amount of words in the text file and print the last ten and first ten. My program seems to not work once I hit any returns on the text file for spacing or text files with a big amount of words. Can anyone help guide me towards the right direction.

Thank you!!

#include <iostream>
#include <cmath>
#include <cstdlib>
#include <fstream>
#include <string>
#include <iomanip>
#include <stdio.h>
#include <sstream>
#include <algorithm>


using namespace std;

string  *tokenizedWords = new string[5000000]; //array to hold tokenized words
string *tokenizedReversed = new string[5000000]; //array to hold tokenized words reverse

int tokenize(string linesFromFile, string tokenizedWords[]);

int main()
{
    string linesFromFile; //holds the line read from a file
    int wordCount = 0; //int to keep wordcount in the while loop
    int firstLast = 10; //the amount for first ten and last ten words to print
    string theNameofTheFile; //to hold the filename

    cout << "Please enter a filename: " << endl; //asks the user
    cin >> theNameofTheFile;

    ifstream inputStream(theNameofTheFile.c_str());

    if(inputStream.is_open()) //check if the file opened correctly
    {
        while(std::getline(inputStream, linesFromFile)) //reads all of the lines
        {
            wordCount++;

            if(linesFromFile.length() > 0) //checks if the line is empty
            {

                wordCount = tokenize(linesFromFile,tokenizedWords);

                if(wordCount < firstLast) //if text file has less than ten words
                    {
                        cout << "This text file is smaller than 10 words so I can not print first and last 10 words." << endl;
                        return 0;
                    }

                if(wordCount > firstLast) //if textfile has more than ten words
                {

                    cout << endl;
                    cout << "The first ten words of the document are:" << endl;
                    cout << endl;

                    for(int j = 0; j < firstLast; j++)
                    {
                        cout << tokenizedWords[j] << endl;
                    }

                    cout << endl;
                    cout << "The last ten words of the document are:" <<  endl;
                    cout << endl;

                    std::reverse_copy(tokenizedWords, tokenizedWords+wordCount, tokenizedReversed);

                    for(int i = 0; i < firstLast; i++)
                    {
                        cout << tokenizedReversed[i] << endl;
                    }

                }
            }

            inputStream.close(); //close the file

            cout<< "Total amount of words is: " << wordCount << endl;
        }


    }

    else
    {
        //if wrong file is inputed
        cout << "Sorry the file " << theNameofTheFile << " does not exists " << endl;
    }

    return 0;
}

int tokenize(string linesFromFile, string tokenizedWords[])
{
    int totalWords (0);

   istringstream toTokenize(linesFromFile);
   while (toTokenize.good())
   {
      toTokenize >> tokenizedWords[totalWords++];
   }

    return (totalWords);
}
ejs
  • 11
  • 2
  • I recommend you read about [`std::vector`](http://en.cppreference.com/w/cpp/container/vector), [`std::copy`](http://en.cppreference.com/w/cpp/algorithm/copy), [`std::istream_iterator`](http://en.cppreference.com/w/cpp/iterator/istream_iterator) and [`std::back_inserter`](http://en.cppreference.com/w/cpp/iterator/back_inserter). With those you can read and "tokenize" in basically a single call to `std::copy`. – Some programmer dude Jan 25 '15 at 07:35
  • By the way, your logic for some things is off. For example, you don't check if the *file* is less than ten "words" long, you do that check for each *line*. – Some programmer dude Jan 25 '15 at 07:39
  • Oh, and with [`std::vector`](http://en.cppreference.com/w/cpp/container/vector) you dont have to allocate memory for *ten million* strings, of which you will use just an extremely small fraction. – Some programmer dude Jan 25 '15 at 07:41
  • 2
    `5000000` - because 4999999 is not enough and 5000001 is just too many? `std::vector<>` and `std::deque` would both come in handy for this problem. And you've shown no reason in your description to break this in to lines to begin with. Your requirements state *words*. Lines should have nothing to do with this unless you're also attempting to resolve hyphenation. – WhozCraig Jan 25 '15 at 07:58
  • 1
    Short example using the standard library [can be seen here](http://ideone.com/LxP3GH). Best of luck. – WhozCraig Jan 25 '15 at 08:14

0 Answers0