2

I'm working on a homework problem which requires me to read in words from an input file, and an integer k. The solution needs to print out a list of words and their frequencies, ranging from the most frequent to the k-th most frequent. If the number of unique words is smaller than k then only output that number of words.

This would have been cake with containers like map, but the problem constrains me to be able to use vectors and strings only and no other STL containers.

I'm stuck at the point where I have a list of all the words in a file and their corresponding frequencies. Now I need to sort them according to their frequencies and output k words.

The problem is, sorting is difficult. The frequencies can be of different digits. If I sort them using string::sort() by padding zeros, I won't be able to know how many zeros to pad since input is unknown to the programmer.

Here's my code for the function:

void word_frequencies(ifstream& inf, int k)
{
    vector <string> input;
    string w;
    while (inf >> w)
    {
        remove_punc(w);
        input.push_back(w);
    }
    sort(input.begin(), input.end());

    // initialize frequency vector
    vector <int> freq;
    for (size_t i = 0; i < input.size(); ++i) freq.push_back(1);

    // count actual frequencies
    int count = 0;
    for (size_t i = 0; i < input.size()-1; ++i)
    {
        if (input[i] == input[i+1])
        {
            ++count;
        } else
        {
            freq[i] += count;
            count = 0;
        }
    }

    // words+frequencies
    vector <string> wf;
    for (size_t i = 0; i < freq.size()-1; ++i)
    {

        if (freq[i] > 1 || is_unique(input, input[i]))
        {
            string s = to_string(freq[i]) + " " + input[i];
            wf.push_back(s);
        }
    }
}

Also, should I even couple the frequency with the word in the first place? I know this is messy so I'm looking for a more elegant solution.

Thanks!

Engineero
  • 12,340
  • 5
  • 53
  • 75
  • Is your assignment written such that you can copy and paste it? I think some information is missing. It doesn't make much sense to do this without maintaining a count of each word somewhere. As you said, a map, unordered_map, or hashtable, would make more sense to me then just vector and strings. You could make a vector< std::pair > , but I doubt that is what your instructor is going for. – Christopher Pisz May 10 '17 at 22:45
  • Since SO restricts the length of a comment here are the instructions under Homework -> hw5: http://terminus.scu.edu/~ntran/csci61-s17/index.html – merukii6912 May 10 '17 at 23:22
  • I'm not allowed to use ANY STL container other than vector and string. So that means pairs are not allowed. – merukii6912 May 10 '17 at 23:24
  • Well, I guess you can maintain 2 vectors. 1 with the words and 1 with the # of occurences and keep them synced such that the index of one corresponds with the index of the other. You'd have to keep that in sync every time you sort, insert, or erase. You'd have to search the word vector for each word as you parse the file and then increment occurences, and sort when done. Awfully inefficient and seems to be teaching you bad habits rather than something useful though. – Christopher Pisz May 10 '17 at 23:27
  • You could just define your own pair like struct. After all it's just a simple struct, with not much functionality. You wouldn't even need to template it as you know the types you need. – Paul Rooney May 10 '17 at 23:28

1 Answers1

0

If I understand you, your problem is that you want to sort your frequency vector, but that then you lose track of their corresponding word. As suggested, using a struct with a custom comparison function is probably desirable:

struct word_freq {
    int freq;
    std::string word;
};

bool operator<(const word_freq& a, const word_freq& b) {
    return a.freq < b.freq;
}

Now, having a std::vector<word_freq> wf; and applying std::sort(wf.begin(), wf.end()) should order your list min -> max. To print the k words with highest frequency you would print from the back of the wf list.

pingul
  • 3,351
  • 3
  • 25
  • 43