0

We are using a Stream to search an ArrayList of strings the Dictionary file is sorted & contains 307107 words all in lower case
We are using the findFirst to look for a match from the text in a TextArea
As long as the word is misspelled beyond the 3 character the search has favoriable results
If the misspelled word is like this "Charriage" the results are nothing close to a match
The obvious goal is to get as close to correct without the need to look at an enormous number of words

Here is the text we are tesing
Tak acheive it hommaker and aparent as Chariage NOT ME Charriag add missing vowel to Cjarroage

We have made some major changes to the stream search filters with reasonable improvements
We will edit the posted code to include ONLY the part of the code where the search is failing
And below that the code changes made to the stream filters
Before the code change if the searchString had a misspelled char at position 1 no results were found in the dictionary the new search filters fixed that
We also added more search information by increasing the number of char for endsWith
So what is still failing! If the searchString(misspelled word) is missing a char at the end of the word and if the word has an incorrect char from position 1 to 4 the search fails
We are working on adding & removing char but we are not sure this is a workable solution

Comments or code will be greatly appreciated if you would like the complete project we will post on GitHub Just ask in the comments

The question is still how to fix this search filter when multiple char are missing from the misspelled word?

After multiple hours of searching for a FREE txt Dictionary this is one of the best
A side bar fact it has 115726 words that are > 5 in length and have a vowel at the end of the word. That means it has 252234 words with no vowel at the end
Does that mean we have a 32% chance of fixing the issue by adding a vowel to the end of the searchString? NOT a question just an odd fact!

HERE is a link to the dictionary download and place the words_alpha.txt file on C drive at C:/A_WORDS/words_alpha.txt"); words_alpha.txt

Code Before Changes

}if(found != true){

    lvListView.setStyle("-fx-font-size:18.0;-fx-background-color: white;-fx-font-weight:bold;");
    for(int indexSC = 0; indexSC < simpleArray.length;indexSC++){

    String NewSS = txtMonitor.getText().toLowerCase();

    if(NewSS.contains(" ")||(NewSS.matches("[%&/0-9]"))){
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText(NOT+" Not in Dictionary");
        txaML.appendText(NOT+" Not in Dictionary");
        onCheckSpelling();
        return;
    }

    int a = NewSS.length();
    int Z;
    if(a == 0){// manage CR test with two CR's
        Z = 0;
    }else if(a == 3){
        Z = 3;
    }else if(a > 3 && a < 5){
        Z = 4;
    }else if(a >= 5 && a < 8){
        Z = 4;
    }else{
        Z = 5;
    }

    System.out.println("!!!! NewSS "+NewSS+" a "+a+" ZZ "+Z);

    if(Z == 0){// Manage CR in TextArea
        noClose = true;
        strSF = "AA";
        String NOT = txtMonitor.getText().toLowerCase();
        //txtTo.setText("Word NOT in Dictionary");// DO NO SEARCH
        //txtTest.setText("Word NOT in Dictionaary");
        txtTest.setText("Just a Space");
        onCheckSpelling();   
    }else{
        txtTest.setText("");
        txaML.clear();
        txtTest.setText("Word NOT in Dictionaary");
        txaML.appendText("Word NOT in Dictionaary");
        String strS = searchString.substring(0,Z).toLowerCase();
        strSF = strS; 
    }
    // array & list use in stream to add results to ComboBox
    List<String> cs = Arrays.asList(simpleArray);
    ArrayList<String> list = new ArrayList<>();

    cs.stream().filter(s -> s.startsWith(strSF))
      //.forEach(System.out::println); 
    .forEach(list :: add);   

    for(int X = 0; X < list.size();X++){
    String A = (String) list.get(X);  

Improved New Code

        }if(found != true){

    for(int indexSC = 0; indexSC < simpleArray.length;indexSC++){

    String NewSS = txtMonitor.getText().toLowerCase();
    if(NewSS.contains(" ")||(NewSS.matches("[%&/0-9]"))){
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText(NOT+" Not in Dictionary");

        onCheckSpelling();
        return;
    }
    int a = NewSS.length();
    int Z;
    if(a == 0){// manage CR test with two CR's
        Z = 0;
    }else if(a == 3){
        Z = 3;
    }else if(a > 3 && a < 5){
        Z = 4;
    }else if(a >= 5 && a < 8){
        Z = 4;
    }else{
        Z = 5;
    }

    if(Z == 0){// Manage CR
        noClose = true;
        strSF = "AA";
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText("Just a Space");
        onCheckSpelling();

    }else{
        txtTest.setText("");
        txtTest.setText("Word NOT in Dictionaary");
        String strS = searchString.substring(0,Z).toLowerCase();
        strSF = strS; 
    }
    ArrayList list = new ArrayList<>(); 
    List<String> cs = Arrays.asList(simpleArray);
    // array list & list used in stream foreach filter results added to ComboBox
    // Code below provides variables for refined search
    int W = txtMonitor.getText().length();

    String nF = txtMonitor.getText().substring(0, 1).toLowerCase();

    String nE = txtMonitor.getText().substring(W - 2, W);
    if(W > 7){
    nM = txtMonitor.getText().substring(W-5, W);
    System.out.println("%%%%%%%% nE "+nE+" nF "+nF+" nM = "+nM);
    }else{
    nM = txtMonitor.getText().substring(W-1, W);   
    System.out.println("%%%%%%%% nE "+nE+" nF "+nF+" nM = "+nM);
    }

    cs.stream().filter(s -> s.startsWith(strSF)
            || s.startsWith(nF, 0)
            && s.length()<= W+2
            && s.endsWith(nE)
            && s.startsWith(nF)
            && s.contains(nM)) 
    .forEach(list :: add);

    for(int X = 0; X < list.size();X++){
    String A = (String) list.get(X);
    sort(list);

    cboSelect.setStyle("-fx-font-weight:bold;-fx-font-size:18.0;");
    cboSelect.getItems().add(A);
    }// Add search results to cboSelect
    break;

Here is a screen shot of the FXML file the controls are named the same as the names used in our code with the exception of the ComboBox
FXML layout

Vector
  • 3,066
  • 5
  • 27
  • 54
  • 1
    [mcve] please .. and voting to close because you never provided any in the past – kleopatra Oct 31 '19 at 22:29
  • @Grendel We are looking the additional code is clouding our review. Will keep looking. How did you plan to search endsWith after the startsWith fails? Not a reliable method we obtained odd results and the code manipulation was too much work for the results – James_Duh Oct 31 '19 at 22:42
  • 1
    A MRE should not be too hard for you to write. Remove everything having to do with the UI, make a small sample dictionary in Java, and call your spell checker routine. Make sure to tell us the results you are seeing and what you expect to see. You will probably answer your own question by doing this exercise, if so make sure to answer your own question so people don't ignore you in the future for not staying engaged. – SephB Oct 31 '19 at 23:45
  • @SephB Sorry about the sample dictionary I will post a link to the real dictionary What was expected was to see with the example word Charriage is any word from the Dictionary that starts with Ca this will never happen unless the "h" is removed from the misspelled word My original posted code we felt was enough to work through the question with thought. I did post all the code and will add a screen shot of the FXML Thanks – Vector Nov 01 '19 at 01:11
  • 1
    I suggest cleaning up the code. E.g. in `loadFile`, just use `data = Files.list(Paths.get("C:/A_WORDS/words_alpha.txt"));` Then, `new String[]{}` could be `new String[0]`, though you shouldn’t copy the list to an array, just to convert it back to a `List` later-on. Just consistently use the `List`. Note that `matches("%|&|/|0|1|2|3|4|5|6|7|8|9")` can be simplified to `matches("[%&/0-9]")`, in case of `replaceAll("[!||.||?]","")` the `|` has no meaning, it has the same effect as `replaceAll("[!|.?]","")`, but likely, you actually meant `replaceAll("[!.?]","")`. – Holger Nov 01 '19 at 08:48
  • 1
    The condition `Arrays.toString(roar).contains("")` will always be true. The declaration `ArrayList list = new ArrayList<>();` is using a *raw type*. You should change it to `ArrayList list = new ArrayList<>();` instead of using type casts in the subsequent code. And so on… – Holger Nov 01 '19 at 08:49
  • @Holger First Thanks for pointing out the errors That said this is my first time working with Arrays and converting txt files to usable information We know something did not look correct but the focus was on the spelling issue where if the illplaced character was in position 2 or 3 how to form a new search term. We will implement your change suggestions to paraphrase the line from the movie Cool Hand Luke What we have here is a novice trying to handle a professional project – Vector Nov 01 '19 at 17:17
  • @Holger We made some progress and posted the new stream filters. If you have time to take a look and offer suggestions great if not thanks once again for the comments. – Vector Nov 04 '19 at 03:07
  • @SephB We posted new stream filters code thanks for the words of encouragement If you have other suggestion they will be greatly appreciated – Vector Nov 04 '19 at 03:09

3 Answers3

4

I am adding a JavaFX answer. This app uses Levenshtein Distance. You have to click on Check Spelling to start. You can select a word from the list to replace the current word being checked. I notice Levenshtein Distance returns lots of words so you might want to find other ways to reduce the list down even more.

Main

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javafx.application.Application;
import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.ListView;
import javafx.scene.control.TextArea;
import javafx.scene.control.TextField;
import javafx.scene.layout.VBox;
import javafx.stage.Stage;

public class App extends Application
{

    public static void main(String[] args)
    {
        launch(args);
    }

    TextArea taWords = new TextArea("Tak Carrage thiss on hoemaker answe");
    TextField tfCurrentWordBeingChecked = new TextField();
    //TextField tfMisspelledWord = new TextField();
    ListView<String> lvReplacementWords = new ListView();
    TextField tfReplacementWord = new TextField();

    Button btnCheckSpelling = new Button("Check Spelling");
    Button btnReplaceWord = new Button("Replace Word");

    List<String> wordList = new ArrayList();
    List<String> returnList = new ArrayList();
    HandleLevenshteinDistance handleLevenshteinDistance = new HandleLevenshteinDistance();
    ObservableList<String> listViewData = FXCollections.observableArrayList();

    @Override
    public void start(Stage primaryStage)
    {
        setupListView();
        handleBtnCheckSpelling();
        handleBtnReplaceWord();

        VBox root = new VBox(taWords, tfCurrentWordBeingChecked, lvReplacementWords, tfReplacementWord, btnCheckSpelling, btnReplaceWord);
        root.setSpacing(5);
        Scene scene = new Scene(root);
        primaryStage.setScene(scene);
        primaryStage.show();
    }

    public void handleBtnCheckSpelling()
    {
        btnCheckSpelling.setOnAction(actionEvent -> {
            if (btnCheckSpelling.getText().equals("Check Spelling")) {
                wordList = new ArrayList(Arrays.asList(taWords.getText().split(" ")));
                returnList = new ArrayList(Arrays.asList(taWords.getText().split(" ")));
                loadWord();
                btnCheckSpelling.setText("Check Next Word");
            }
            else if (btnCheckSpelling.getText().equals("Check Next Word")) {
                loadWord();
            }
        });
    }

    public void handleBtnReplaceWord()
    {
        btnReplaceWord.setOnAction(actionEvent -> {
            int indexOfWordToReplace = returnList.indexOf(tfCurrentWordBeingChecked.getText());
            returnList.set(indexOfWordToReplace, tfReplacementWord.getText());
            taWords.setText(String.join(" ", returnList));
            btnCheckSpelling.fire();
        });
    }

    public void setupListView()
    {
        lvReplacementWords.setItems(listViewData);
        lvReplacementWords.getSelectionModel().selectedItemProperty().addListener((obs, oldSelection, newSelection) -> {
            tfReplacementWord.setText(newSelection);
        });
    }

    private void loadWord()
    {
        if (wordList.size() > 0) {
            tfCurrentWordBeingChecked.setText(wordList.get(0));
            wordList.remove(0);
            showPotentialCorrectSpellings();
        }
    }

    private void showPotentialCorrectSpellings()
    {
        List<String> potentialCorrentSpellings = handleLevenshteinDistance.getPotentialCorretSpellings(tfCurrentWordBeingChecked.getText().trim());
        listViewData.setAll(potentialCorrentSpellings);
    }
}

CustomWord Class

/**
 *
 * @author blj0011
 */
public class CustomWord
{

    private int distance;
    private String word;

    public CustomWord(int distance, String word)
    {
        this.distance = distance;
        this.word = word;
    }

    public String getWord()
    {
        return word;
    }

    public void setWord(String word)
    {
        this.word = word;
    }

    public int getDistance()
    {
        return distance;
    }

    public void setDistance(int distance)
    {
        this.distance = distance;
    }

    @Override
    public String toString()
    {
        return "CustomWord{" + "distance=" + distance + ", word=" + word + '}';
    }
}

HandleLevenshteinDistance Class

/**
 *
 * @author blj0011
 */
public class HandleLevenshteinDistance
{

    private List<String> dictionary = new ArrayList<>();

    public HandleLevenshteinDistance()
    {
        try {
            //Load DictionaryFrom file
            //See if the dictionary file exists. If it don't download it from Github.
            File file = new File("alpha.txt");
            if (!file.exists()) {
                FileUtils.copyURLToFile(
                        new URL("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"),
                        new File("alpha.txt"),
                        5000,
                        5000);
            }

            //Load file content to a List of Strings
            dictionary = FileUtils.readLines(file, Charset.forName("UTF8"));
        }
        catch (IOException ex) {
            ex.printStackTrace();
        }

    }

    public List<String> getPotentialCorretSpellings(String misspelledWord)
    {
        LevenshteinDistance levenshteinDistance = new LevenshteinDistance();
        List<CustomWord> customWords = new ArrayList();

        dictionary.stream().forEach((wordInDictionary) -> {
            int distance = levenshteinDistance.apply(misspelledWord, wordInDictionary);
            if (distance <= 2) {
                customWords.add(new CustomWord(distance, wordInDictionary));
            }
        });

        Collections.sort(customWords, (CustomWord o1, CustomWord o2) -> o1.getDistance() - o2.getDistance());

        List<String> returnList = new ArrayList();
        customWords.forEach((item) -> {
            System.out.println(item.getDistance() + " - " + item.getWord());
            returnList.add(item.getWord());
        });

        return returnList;
    }
}
SedJ601
  • 12,173
  • 3
  • 41
  • 59
  • 1
    Kudos for trying the impossible - answering a question that's beyond repair :) – kleopatra Nov 02 '19 at 10:30
  • 1
    @Sedrick Yes it found a lot of words but that said we are working on a game from the 1956 era called Toss Word with less possible words and we tested it with that app This is way more Array Lists than we deal with SQL was fun to learn as I had a friend from Australia who was part of a VB6 help group she was a owner developer that did a lot of database work We learned how to use all the Joins and think in SQL Give me a DB any day even if this Array List is unreal FAST – Vector Nov 06 '19 at 07:50
2

You just needed to go a little further out into the Dictionary
We are sure you were getting a lot of suggested words from the Dictionary?
We tested your code and sometimes it found 3000 or more possible matches WOW
So here is the BIG improvement. It still needs a lot of testing we used this line for our tests with 100% favorable results.

Tske Charriage to hommaker and hommake as hommaer

Our fear is if the speller really butchers the word this improvement might solve that degree of misspelling
We are sure you know that if the first letter is wrong this will not work
Like zenophobe for xenophobe

Here is the BIG improvement tada

     cs.stream().filter(s -> s.startsWith(strSF)
            || s.startsWith(nF, 0)
            && s.length() > 1 && s.length() <= W+3 // <== HERE
            && s.endsWith(nE)
            && s.startsWith(nF)
            && s.contains(nM)) 
    .forEach(list :: add); 

You can send the check to my address 55 48 196 195

James_Duh
  • 1,321
  • 11
  • 31
0

This question is a possible duplicate: Search suggestion in strings

I think you should be using something similar to Levenshtein Distance or Jaro Winkler Distance. If you can use Apache's Commons. I would suggest using Apache Commons Lang. It has an implementation of Levenshtein Distance. The example demos this implementation. If you set the distance to (distance <= 2), you will potentially get more results.

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.nio.charset.Charset;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.StringUtils;

/**
 *
 * @author blj0011
 */
public class Main
{

    public static void main(String[] args)
    {
        try {
            System.out.println("Hello World!");
            File file = new File("alpha.txt");
            if (!file.exists()) {
                FileUtils.copyURLToFile(
                        new URL("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"),
                        new File("alpha.txt"),
                        5000,
                        5000);
            }

            List<String> lines = FileUtils.readLines(file, Charset.forName("UTF8"));
            //lines.forEach(System.out::println);

            lines.stream().forEach(line -> {
                int distance = StringUtils.getLevenshteinDistance(line, "zorilta");
                //System.out.println(line + ": " + distance);
                if (distance <= 1) {
                    System.out.println("Did you mean: " + line);
                }
            });

        }
        catch (IOException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

Output distance <= 1

Building JavaTestingGround 1.0
------------------------------------------------------------------------

--- exec-maven-plugin:1.5.0:exec (default-cli) @ JavaTestingGround ---
Hello World!
Did you mean: zorilla
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.329 s
Finished at: 2019-11-01T11:02:48-05:00
Final Memory: 7M/30M

Distance <= 2

Hello World!
Did you mean: corita
Did you mean: gorilla
Did you mean: zoril
Did you mean: zorilla
Did you mean: zorillas
Did you mean: zorille
Did you mean: zorillo
Did you mean: zorils
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.501 s
Finished at: 2019-11-01T14:03:33-05:00
Final Memory: 7M/34M

See the possible duplicate for more details about Levenshtein Distance.

SedJ601
  • 12,173
  • 3
  • 41
  • 59
  • Your answer is Great we thought about Livenshtein Distance just could not wrap my head around it. Back to your answer where you have file, Charset.forName("UTF8")); in place of file we would like to use the misspelled word. I may be lost but if we use file not sure this will work as we know that the misspelled word is not in the file (Dictionary) We are trying to change file to word but not much success so far – Vector Nov 01 '19 at 19:37
  • I don' understand your comment. The file part just allows the program to read the file that you have on `Github`. The contents of that file are added to a `List`. if you uncomment `//lines.forEach(System.out::println);`, you will see the content that is in the file you posted on `GitHub`. The program uses `int distance = StringUtils.getLevenshteinDistance(line, "zorilta");` to find the `Livenshteien Distance` of every word in the List stream. In this case, `zorilta` is the misspelled word. – SedJ601 Nov 01 '19 at 20:02
  • OK I am lost disregard my comment I am trying to implement your code but getLevenshteinDistance is telling me can't find symbol code is in new test project I will post what I have if it helps meanwhile we will try to fix the warning – Vector Nov 01 '19 at 20:13
  • Down loaded JAR commons-lang 3.3.9.jar in library and classpath not improvement will do some reading Thanks once again – Vector Nov 01 '19 at 20:57
  • getLevenshteinDistance Depreciated any thoughts ? ? ? we created a Main Class and code looks good except for the depreciated We are lost now – Vector Nov 01 '19 at 21:35
  • I just notice that. You need to get it via `org.apache.commons.text` – SedJ601 Nov 01 '19 at 21:40
  • I just started using Maven, but you can download the jar binaries – SedJ601 Nov 01 '19 at 21:55
  • Because this is JavaFX the IDE is going crazy trying to configure the code it has me running in circles Any Thoughts are welcomed but I am at a point of no idea what I am doing – Vector Nov 01 '19 at 22:50
  • What IDE are you using? What version of Java? – SedJ601 Nov 03 '19 at 01:09
  • We are using NetBeans 8.2 and Java 1.8 – Vector Nov 04 '19 at 02:23
  • We just posted some improvements and new code but not anywhere near what you answer was. Will continue to look at LevenshteinDistance Thanks – Vector Nov 04 '19 at 03:04