Peter Norvig's famous spellchecker (Java 8 version here) is able to correct single words, if something close to that word appears in training data. But how can I adapt it to handle entire phrases. For example, if I have a file where each phrase is separated by a new line :
Plastic box
Pencils and sketch
Romeo and Juliet
.
.
.
If I tell the algorithm to correct 'Platic'
, it should return 'Plastic box'
. Similarly, if I tell it to correct 'Pencils'
, it should return 'Pencils and sketch'
.
I tried to change the following lines of the above code (Java version) :
Stream.of(new String(Files.readAllBytes( dictionaryFile )).toLowerCase().replaceAll("[^a-z ]","").split(" ")).forEach( (word) ->{
dict.compute( word, (k,v) -> v == null ? 1 : v + 1 );
});
to
Stream.of(new String(Files.readAllBytes( dictionaryFile )).toLowerCase().split("\n")).forEach( (word) ->{
dict.compute( word, (k,v) -> v == null ? 1 : v + 1 );
});
but it didn't seem to work.