0

So, I am trying to generate a word and then check if it is a real word using word index. Can someone help me figure out how to solve this, it is constantly giving me a never-ending loop. I just simply wanted to create random characters, and after each character see if it exists in the hashmap. First by checking to find the key which is the first letter of the word, and then by checking the substring of each word in that key selection.

  public ArrayList<String> randomSize(int length) {
            ArrayList<String> randomWords = new ArrayList<>(length);
            String letters = "abcdefghijklmnopqrstuvwxyz";
            // for loop add a new slot into randomWords
            for (int eachSlot = 0; eachSlot < length; eachSlot++) {
                // for each slot generate a random length for the word
                int randWordLength = (int) (Math.random() * (10 - 0) + 0);
                // every slot generate a random firstLetter
                int slotFound = 0;
                String firstConfirmedLetter = "";
                // while the first letter is not found in WordIndex
                while (slotFound == 0) {
                    int randNumer = (int) (Math.random() * (24 - 0) + 0);
                    String firstLetter = letters.substring(randNumer, randNumer + 1);
                    if (wordIndex.containsKey(firstLetter) == true)
                    {
                        firstConfirmedLetter = firstLetter;
                        randomWords.add(firstConfirmedLetter);
                        System.out.println(firstLetter);// working
                        randWordLength--;
                        slotFound = 1;
                        // if it is found end the while loop
                    }
    
                }
    
                // we found the first letter, now we need to find the rest of the letters
                for (int eachLetter = 0; eachLetter < randWordLength; eachLetter++){
                    int isFound = 0;
                    // while letter is not found loop through until it is found with combimation to the previous letter
                    while (isFound == 0){
                        // gerate a random letter
                        int randLetter = (int) (Math.random() * (24 - 0) + 0);
                        String nextLetter = letters.substring(randLetter, randLetter + 1);
                        //create curr word
                        String currWord = randomWords.get(eachSlot) + nextLetter;// works until here
                        // loop through each word in wordIdex to find match
                        System.out.println(wordIndex.get(firstConfirmedLetter).size());
                        for(int i = 0; i< wordIndex.get(firstConfirmedLetter).size(); i++){
                            String test = wordIndex.get(firstConfirmedLetter).get(i);
                            if(test.length() > eachLetter+2){
                              System.out.println(test.substring(0,eachLetter+2));
                              if(test.substring(0,eachLetter+2).equals(currWord)){
                                  String currState = randomWords.get(eachSlot);
                                  randomWords.set(eachSlot,currWord);
                                  isFound =1;
                              }
                            }
                        }
    
                    }
                }
    
    
    
            }
            return randomWords;
        }
Nexteon
  • 39
  • 6
  • You underestimate the universe of existing words. – leoconco Sep 30 '21 at 22:17
  • You can actually use more than one method to solve your problem. Creating a semi-random word consisting of 5 letters, where letters 1, 3, and 5 are consonants and letters 2 and 4 are vowels might get you a hit if you let the program run over a weekend. – Gilbert Le Blanc Sep 30 '21 at 22:43

2 Answers2

1

The odds of generating a valid word randomly are low, so this approach is inefficient. Instead, randomly select a valid word from your dictionary:

private final List<String> words = wordIndex.values().stream()
    .flatMap(List::stream)
    .collect(Collectors.toList());

public List<String> randomSize(int length) {
  Collections.shuffle(words);
  return new ArrayList<>(words.subList(0, length));
}

Your code is very difficult to read and therefore difficult to fix, but here are some bugs in it:

  • The randWordLength chosen can be zero, but this will still result in a one character word being selected. I presume you intended to select words of length 1–10, inclusive.

  • You only randomly choose from the first 24 letters of your 26-letter set. Presumably, your dictionary contains words with all 26 letters, but these can never be found.

  • Most importantly, you are testing prefixes of words of any length, but you keep looping until you find a word of the specified length. As a simplified example, consider a dictionary of two words, "ab" and "acc". If a length of three is required, but the loop chooses the prefix "ab", it will loop forever, trying to find a three-letter word that starts with "ab".

You must match prefixes only against words of the required length.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • yeah I already did that. But would how know how I could do an efficient way of generating a word and checking it with my dictionary? Thanks – Nexteon Sep 30 '21 at 23:30
  • @Nexteon The only way to get more efficient with a generative strategy is to make it less random. For example, you could use [a Markov process](https://en.wikipedia.org/wiki/Markov_chain) to increase the probability of choosing a valid letter based on previously chosen letters. This doesn't eliminated random generation entirely, but it's like using weighted dice to improve your odds. – erickson Sep 30 '21 at 23:56
-1

You underestimate the universe of existing words. There are: 24^10 possibilities, that is 3.6520347e16 possible words. It's not infinite if you let it running long enough, is just a terrible strategy.

You could increase the probability for early positives by combining random weighted generation like here https://stackoverflow.com/a/6409791/1803810 , with a distribution based on the frequency of the appearance of letters in english http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html.

leoconco
  • 253
  • 3
  • 15
  • If you fix the bugs, it's actually not that slow (milliseconds per word). – erickson Oct 01 '21 at 17:50
  • Even if 1 milisecond, (it's probably around 20), 3.6520347e16 possible generated words, to find one of the 171,476 in english, means, if uniformly distributed, the first word will be found in the first 3 years – leoconco Oct 01 '21 at 20:01
  • No. I am talking about the code in the question. It isn't generating random character sequences and then checking whether they are in the dictionary. A letter is only added to the word being built if it is a prefix of a valid word. So (with bugs fixed), the code in question outputs a word in a fraction of a second. – erickson Oct 01 '21 at 20:04