2

I have a text file containing: mariam amr sara john jessy salma mkkkkkaooooorllll

the user enters a word to search for: for example: maram

As you can see, it does not exist in my text file .. I want to give suggestions, similar to the word maram is mariam

I used longest common subsequence but it gives mariam and mkkkkkaooooorllll because both contain the Longest common subsequence "mar"

I want to force the choice of mariam only Any ideas ?

Thanks in advance

/**
 ** Java Program to implement Longest Common Subsequence Algorithm
 **/

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

   /** Class  LongestCommonSubsequence **/
    public class  LongestCommonSubsequence
    {    
   /** function lcs **/
    public String lcs(String str1, String str2)
    {
    int l1 = str1.length();
    int l2 = str2.length();

    int[][] arr = new int[l1 + 1][l2 + 1];

    for (int i = l1 - 1; i >= 0; i--)
    {
        for (int j = l2 - 1; j >= 0; j--)
        {
            if (str1.charAt(i) == str2.charAt(j))
                arr[i][j] = arr[i + 1][j + 1] + 1;
            else 
                arr[i][j] = Math.max(arr[i + 1][j], arr[i][j + 1]);
        }
    }

    int i = 0, j = 0;
    StringBuffer sb = new StringBuffer();
    while (i < l1 && j < l2) 
    {
        if (str1.charAt(i) == str2.charAt(j)) 
        {
            sb.append(str1.charAt(i));
            i++;
            j++;
        }
        else if (arr[i + 1][j] >= arr[i][j + 1]) 
            i++;
        else
            j++;
    }


    return sb.toString(); 
   //read text file, if a word contains sb.toString() , print it


}

/** Main Function **/
public static void main(String[] args) throws IOException
{    
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    System.out.println("Longest Common Subsequence Algorithm Test\n");

    System.out.println("\nEnter string 1");
    String str1 = br.readLine();

    System.out.println("\nEnter string 2");
    String str2 = br.readLine();

    LongestCommonSubsequence obj = new LongestCommonSubsequence(); 
    String result = obj.lcs(str1, str2);

    System.out.println("\nLongest Common Subsequence : "+ result);
}

}

CodeX
  • 135
  • 2
  • 13
  • I think thats because only `mariam` and `mkkk.. `starts with an `m`. I bet your algoright checks char after char from the beggining. Please show us the code – Toumash Jul 01 '15 at 10:55
  • @Toumach it's beacause both have the longest common subsequence mar and i will add the code – CodeX Jul 01 '15 at 10:56
  • I cant answer but i will upvote so ppl will see – Toumash Jul 01 '15 at 11:05

1 Answers1

5

There are a few techniques for fuzzy matching like this - Apache Commons provides some excellent tools for comparing how similar two strings are to one another. Check out the javadoc for Levenshtein Distance and Jaro Winkler Distance calculation methods.

With Levenshtein Distance, the lower the score, the more similar the strings are:

StringUtils.getLevenshteinDistance("frog", "fog") == 1
StringUtils.getLevenshteinDistance("fly", "ant") == 3

You could also consider calculating the Double Metaphone for each string - this will allow you to determine how similar the strings 'sound' when spoken, even if they aren't necessarily spelt similarly.

Back to your question - using these tools, you could throw up suggestions if the user's search term is within a certain threshold of any of the strings in your text file.

rcgeorge23
  • 3,594
  • 4
  • 29
  • 54