Print the most repeated word in a string

Question

Write a java program to find the most repeated word in a string and also print its frequency.

INPUT

are you are

OUTPUT

are: 2

This question can be done by using HashMap or file reader (I suppose) but actually, I haven't learned them yet.

Yet, I managed to write a code that displays the frequency (but not the word)

import java.util.Scanner;
class duplicatewords
   {
       void main()
       {
           Scanner sc=new Scanner(System.in);
           System.out.println("Enter the string");
           String str=sc.nextLine();
           String arr[]=str.split(" ");
           int count=1; int checkvalue=0;
           for(int i=0;i<arr.length-1;i++)
           {
               String temp=arr[i];
               for(int j=i+1;j<arr.length;j++)
               {
                   String anothertemp=arr[j];
                   if(temp.equalsIgnoreCase(anothertemp))
                   count++;
                }
                if(checkvalue<c)
                checkvalue=c;
                c=1;
            }
            System.out.println(checkvalue);
        }
    }

I want to know how to print the word also without using any map or reader.

I think the program will be a very complicated one but I will understand.

Any help will be greatly appreciated.

You need some form of map to map each word to its count. If you do not want to use the existing `Map` implementations of Java, you can start by implementing your own map structure using only arrays, but I think this will be a futile effort. — knittl, Jan 25 '22 at 16:09
yeah, ikr. So should I assume this question cannot be done without using any map? — Zootopia, Jan 25 '22 at 16:10
Will, counting the number of unique words work and, maybe, then something can be done? — Zootopia, Jan 25 '22 at 16:12
You need an associative data structure that allows a counter per word, so that it can hold {are: 2, you: 1}. Since the key of that structure is `String`, you cannot use an array without converting the key to `int`, at which point you've arrived at what a `HashMap` does. — f1sh, Jan 25 '22 at 16:12
@Zootopia It can be done without a `Map`. That is just probably the easiest way to do it. If this is a school assignment, you should do it using the means that they have taught you to this point. — hfontanez, Jan 25 '22 at 16:12
this is not a school assignment ....it was asked in an interview — Zootopia, Jan 25 '22 at 16:12
You can use 2 arrays: one holds the words, one holds the count. The connection is made using the same index value. — f1sh, Jan 25 '22 at 16:13
@Zootopia you may want to check this answer I provided for a similar problem: https://stackoverflow.com/questions/67026362/calculate-keyword-density-of-each-unique-element-in-list-in-java/67051607#67051607. Let me know what you think. For your problem, however, all you need is a `Map` where the key is the word and the value is the count. Keep in mind that you need to get the value of a given key and if it exists, increment the value by one. If not, put the key value pair with the value set to 1. — hfontanez, Jan 25 '22 at 16:17
Or an array which holds an `Object{word,count}` and then always iterate the array to find the object to update. This works, but it is very inefficient — knittl, Jan 25 '22 at 16:21
An array solution, while functional, is very slow for retrieval. — hfontanez, Jan 25 '22 at 16:22
@Zootopia I think you need to consider the edge case when more than one word have the same frequency. For example, what if two or three words are the most frequent words? Do you want to return _any_ of these words or do you want to return _all_ words? If you need to know _all_ words, you need to consider the solution I posted. — hfontanez, Jan 25 '22 at 17:33
@Zootopia You're welcome. I wanted you to consider other use cases. That's why I provided you with the link of my other post where I created a custom data structure to capture this data. Then, on my answer here provided two solutions, with one a bit better than the other. Sometimes, solution one is good enough and sometimes it is not. I still believe solution two is better for the aforementioned reasons. — hfontanez, Jan 25 '22 at 18:59

f1sh · Answer 1 · 2022-01-25T16:41:03.037

Here is my solution using 2 arrays:

public static void main(String[] args) {
    String input = "are you are";
    String[] words = input.split(" ");

    //the 10 is the limit of individual words:
    String[] wordsBucket = new String[10];
    Integer[] countBucket = new Integer[10];

    for(String word:words){
        int index = findIndex(word, wordsBucket);
        incrementIndex(countBucket, index);
    }
    int highest = findMax(countBucket);

    System.out.println(wordsBucket[highest]+": "+countBucket[highest]);
}

private static int findMax(Integer[] countBucket) {
    int max = 0;
    int maxIndex = 0;
    for(int i=0;i<countBucket.length;i++) {
        if(countBucket[i]==null) {
            break;
        }
        if(countBucket[i] > max) {
            max = countBucket[i];
            maxIndex = i;
        }
    }
    return maxIndex;
}

private static int findIndex(String word, String[] wordsBucket) {
    for(int i=0;i<wordsBucket.length;i++) {
        if(word.equals(wordsBucket[i])) {
            return i;
        }
        if(wordsBucket[i] == null) {
            wordsBucket[i] = word;
            return i;
        }
    }
    return -1;
}

private static void incrementIndex(Integer[] countBucket, int index) {
    if(countBucket[index] == null){
        countBucket[index] = 1;
    } else {
        countBucket[index]++;
    }
}

This prints are: 2. As @knittl pointed out in the comments this can also be done with 1 array of Pair<String, Integer> or something similar.

If you are allowed to use Maps and streams, then this works too (using the same String[] words as input as above):

Map<String, Integer> countingMap = new HashMap<>();
Arrays.stream(words).forEach(w->countingMap.compute(w, (ww,c)->c==null?1:c+1));
Map.Entry<String, Integer> h = countingMap.entrySet().stream().sorted(Comparator.comparingInt(Map.Entry<String,Integer>::getValue).reversed()).findFirst().get();
System.out.println(h);

This prints are=2

Nowhere Man · Accepted Answer · 2022-01-25T17:42:16.007

In fact, to get the most frequent word, existing code needs to be modified slightly just to provide a variable for the most repeated variable which has to be updated when a more frequent word is detected. No extra arrays/data structures is needed for this specific task.

String arr[] = str.split(" ");
int maxFreq = 0;
String mostRepeated = null;

for (int i = 0; i < arr.length; i++) {
    String temp = arr[i];
    int count = 1;
    for (int j = i + 1; j < arr.length; j++) {
        if (temp.equalsIgnoreCase(arr[j]))
            count++;
    }
    if (maxFreq < count) {
       maxFreq = count;
       mostRepeated = temp;
    }
}
System.out.println(mostRepeated + ": " + maxFreq);

For the input:

String str = "I am he as you are he as you are me and we are all together";

Output:

are: 3

A bit faster implementation could include setting the duplicated values to null to skip them later:

for (int i = 0; i < arr.length; i++) {
    if (null == arr[i]) continue;
    String temp = arr[i];
    int count = 1;
    for (int j = i + 1; j < arr.length; j++) {
        if (temp.equalsIgnoreCase(arr[j])) {
            count++;
            arr[j] = null;
        }
    }
    if (maxFreq < count) {
       maxFreq = count;
       mostRepeated = temp;
    }
}

WJS · Answer 3 · 2022-01-25T18:16:14.953

Here is one way. It just maintains a list of words to count and adjust the max while iterating thru the list.

scratch arrays are allocated to the max number of words.
cnt is adjust to control iterating thru updated arrays based on discovered words.

String s =
        "this when this how to now this why apple when when other now this now";
String[] words = s.split("\\s+");
int cnt = 0;
int idx = -1;
String[] list = new String[words.length];
int[] count = new int[words.length];
int max = 0;
for (int i = 0; i < words.length; i++) {
    for (int k = 0; k < cnt; k++) {
        if (list[k].equals(words[i])) {
            count[k]++;
            if (count[k] > max) {
                max = count[k];
                idx = k;
            }
            break;
        }
    }
    count[cnt] = 1;
    list[cnt++] = words[i];
}

System.out.println(words[idx] + " " + max);

prints

this 4

And here is yet another solution using streams. This simply creates a map of word counts then finds the first entry that has the greatest count. Ties are ignored.

Entry<String, Integer> result = Arrays.stream(s.split("\\s+"))
        .collect(Collectors.toMap(r -> r, q -> 1,
                (a, b) -> a + 1))
        .entrySet().stream().max(Entry.comparingByValue())
        .get();

System.out.println(result);

prints

this=4

hfontanez · Answer 4 · 2022-01-25T17:43:12.540

There are two solutions, but one is better than the other.

Solution one: use `Map<String, Integer>`

public class WordCount {
    public static void main(String[] args) {
        String phrase = "This example is very good but is not very efficient";
        String[] words = phrase.split(" ");
        List<String> wordList = Arrays.asList(words);
        
        Map<String, Integer> wordCountMap = wordList.parallelStream().
                collect(Collectors.toConcurrentMap(
                        w -> w, w -> 1, Integer::sum));
            
        System.out.println(wordCountMap);
    }
}

This produces the following output:

{but=1, very=2, not=1, efficient=1, This=1, is=2, good=1, example=1}

As you can see, very and is are tied for the most frequent word. This is why this solution is not the most efficient. What if you could reverse the map in order to group words with similar frequency together?

Solution two: use `Map<Integer, List<String>>`

In my opinion, this is a better solution. This solution will group all words with similar count. Using the same input of the solution above, words with similar frequency will be bundled together. So, when querying the map for the highest count, both very and is will be returned as expected. Using Lambda expressions makes this task of "inverting" the map very easy:

Map<Integer, List<String>> mapInverted = 
        wordCountMap.entrySet()
           .stream()
           .collect(Collectors.groupingBy(Map.Entry::getValue, Collectors.mapping(Map.Entry::getKey, Collectors.toList())));
        
System.out.println(mapInverted);

After adding these lines to the sample code from solution one, I now have an aggregation of words of similar word counts:

{1=[but, not, efficient, This, good, example], 2=[very, is]}

For both of these approaches, the way to get the max value:


Entry<String, Integer> maxEntry = Collections.max(wordCountMap.entrySet(),
                (Entry<String, Integer> e1, Entry<String, Integer> e2) -> e1.getValue().compareTo(e2.getValue())); // for solution one
System.out.println(maxEntry); // outputs: very=2

Entry<Integer, List<String>> maxKey = Collections.max(mapInverted.entrySet(),
                (Entry<Integer, List<String>> e1, Entry<Integer, List<String>> e2) -> e1.getKey().compareTo(e2.getKey())); // for solution two
System.out.println(maxKey); // outputs: 2=[very, is]

Print the most repeated word in a string

INPUT

OUTPUT

4 Answers4

Solution one: use Map<String, Integer>

Solution two: use Map<Integer, List<String>>

Solution one: use `Map<String, Integer>`

Solution two: use `Map<Integer, List<String>>`