-1

I am loading a file that holds nearly 80,000 words. It will be used as the primary spell checking dictionary. The sequence of the words has been randomized. There is another file which i am loading that has the misspelled words i have to check. Also it provides suggestions to misspelled words.

public void spellCheckDocument(ArrayList<String> dictionary){
        long startCheck = System.currentTimeMillis();
        for(String words: collectionOfParagraphs)
            for(String word: words.split("[^a-zA-Z_0-9']+")){
                int index = Collections.binarySearch(dictionary, word.toLowerCase());
                if(index<0 && word.length()>0){

                    //collectionOfMisspelledWord.add(word+" Possible correct word: "+dictionary.get(-index+1)+" "+dictionary.get(-index)+" "+dictionary.get(-index-1));
                    //System.out.printf("%s Misspelled, possible correct words: %s, %s, %s\n", word, dictionary.get(-index+1),dictionary.get(-index),dictionary.get(-index-1));
                    possibleCorrectSpellings = new Document(word, dictionary.get(-index+1),dictionary.get(-index), dictionary.get(-index-1));
                    collectionOfMisspelledWord.add(possibleCorrectSpellings);
                }           
        }

--------error----------
java.lang.IndexOutOfBoundsException: Index: 380, Size: 379
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at file.Document.spellCheckDocument(Document.java:82)
  • ssibleCorrectSpellings = new Document(word, dictionary.get(-index+1),dictionary.get(-index), dictionary.get(-index-1)); – V15720002000 Oct 28 '14 at 12:23

1 Answers1

0

From the documentation of Collections.binarySearch():

otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key.

That means you sometimes get an index which is past the last element in the list. You need to add special handling for this case (which probably means that you have no idea which words could be correct).

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820