1

In my Swift 3.0 app, I want to determine the best name for something by finding the longest common substring of 6 to 12 strings.

Example strings:

ON/OFF office lights
DIM office lights
VALUE office lights
FB office lights
FB VALUE office lights

Desired output:

office lights

I've come across multiple StackOverflow answers for the longest subsequence but haven't been able to adapt any of them to my needs..

Any help would be greatly appreciated!

roy
  • 6,685
  • 3
  • 26
  • 39
Bram Roelandts
  • 470
  • 7
  • 25
  • 4
    I will be the obvious guy here and forward you to Wikipedia. [This article](https://en.wikipedia.org/wiki/Longest_common_substring_problem) has a very good pseudocode which you can adapt to Swift. – Tamás Sengel Aug 05 '17 at 18:02
  • @the4kman I spent half an hour trying, but my knowledge about arrays in Swift is too limited to accomplish this.. – Bram Roelandts Aug 05 '17 at 19:03
  • I updated my answer , sorry for 1hr delay @BramRoelandts – roy Aug 07 '17 at 05:29

1 Answers1

5

I converted Java & C++ code into Swift 3 , collected from GeeksForGeeks Longest Common Subsequence & Longest Common Substring.

It works !

class LongestCommon
{
    // Returns length of LCS for X[0..m-1], Y[0..n-1]
    private static func lcSubsequence(_ X : String  , _ Y : String  ) -> String
    {
        let m = X.characters.count
        let n = Y.characters.count

        var L = Array(repeating: Array(repeating: 0, count: n + 1 ) , count: m + 1)
        // Following steps build L[m+1][n+1] in bottom up fashion. Note
        // that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1]
        for i in stride(from: 0, through: m, by: 1)
        {
            for j in stride(from: 0, through: n, by: 1)
            {
                if i == 0 || j == 0
                {
                    L[i][j] = 0;
                }
                else if X[X.index( X.startIndex , offsetBy: (i - 1) )] == Y[Y.index( Y.startIndex , offsetBy: (j - 1) )]
                {
                    L[i][j] = L[i-1][j-1] + 1
                }
                else
                {
                   L[i][j] = max(L[i-1][j], L[i][j-1])
                }
            }

        }

        // Following code is used to print LCS
        var index = L[m][n]
        // Create a character array to store the lcs string
        var lcs = ""
        // Start from the right-most-bottom-most corner and
        // one by one store characters in lcs[]
        var i = m
        var j = n

        while (i > 0 && j > 0)
        {
            // If current character in X[] and Y are same, then
            // current character is part of LCS
            if X[X.index( X.startIndex , offsetBy: (i - 1) )] == Y[Y.index( Y.startIndex , offsetBy: (j - 1) )]
            {
                lcs.append(X[X.index( X.startIndex , offsetBy: (i - 1) )])
                i-=1
                j-=1
                index-=1
            }
            // If not same, then find the larger of two and
            // go in the direction of larger value
            else if (L[i-1][j] > L[i][j-1])
            {
                i-=1
            }
            else
            {
                j-=1
            }
        }

        // return the lcs
        return String(lcs.characters.reversed())
    }

    // Returns length of LCS for X[0..m-1], Y[0..n-1]
    private static func lcSubstring(_ X : String  , _ Y : String  ) -> String
    {
        let m = X.characters.count
        let n = Y.characters.count

        var L = Array(repeating: Array(repeating: 0, count: n + 1 ) , count: m + 1)
        var result : (length : Int, iEnd : Int, jEnd : Int) = (0,0,0)
        // Following steps build L[m+1][n+1] in bottom up fashion. Note
        // that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1]
        for i in stride(from: 0, through: m, by: 1)
        {
            for j in stride(from: 0, through: n, by: 1)
            {
                if i == 0 || j == 0
                {
                    L[i][j] = 0;
                }
                else if X[X.index( X.startIndex , offsetBy: (i - 1) )] == Y[Y.index( Y.startIndex , offsetBy: (j - 1) )]
                {
                    L[i][j] = L[i-1][j-1] + 1

                    if result.0 < L[i][j]
                    {
                        result.length = L[i][j]
                        result.iEnd = i
                        result.jEnd = j
                    }
                }
                else
                {
                    L[i][j] = 0 //max(L[i-1][j], L[i][j-1])
                }
            }

        }

        // Following code is used to print LCS


        let lcs = X.substring(with: X.index(X.startIndex, offsetBy: result.iEnd-result.length)..<X.index(X.startIndex, offsetBy: result.iEnd))

        // return the lcs
        return lcs
    }

    // driver program

    class func subsequenceOf(_ strings : [String] ) -> String
    {
        var answer = strings[0] // For on string answer is itself

        for i in stride(from: 1, to: strings.count, by: 1)
        {
            answer = lcSubsequence(answer,strings[i])
        }
        return answer
    }

    class func substringOf(_ strings : [String] ) -> String
    {
        var answer = strings[0] // For on string answer is itself

        for i in stride(from: 1, to: strings.count, by: 1)
        {
            answer = lcSubstring(answer,strings[i])
        }
        return answer
    }



}

Usage :

let strings = ["ON/OFF office lights",
                       "DIM office lights",
                       "VALUE office lights",
                       "FB office lights",
                       "FB VALUE office lights"]
print(LongestCommon.subsequenceOf(strings))
print(LongestCommon.substringOf(strings))
roy
  • 6,685
  • 3
  • 26
  • 39
  • I tested this code some more and this is actually the longest common subsequence of an array of strings, not the longest common **substring**. When passing ["ON/OFF office lights", "FB office lights"], the output will be "F office lights". Any idea how I could tweak your code for that purpose? – Bram Roelandts Aug 06 '17 at 13:09
  • @BramRoelandts Check My answer Now ! – roy Aug 06 '17 at 13:53
  • I tried your new answer and I actually still get the same results. When passing an array of strings with five items, it returns "office lights" as desired, but when passing only two elements (["ON/OFF office lights", "FB office lights"]), it returns "F office lights" for both substring and subsequence. Am I doing something wrong? – Bram Roelandts Aug 06 '17 at 14:01
  • Actually you are looking for Longest Common Sub Words not Substring nor Subsequence , By Algorithm my code is correct, I believe ! Substring and Subsequence finding basically works of Characters not words generally. btw. . It is still applicable and easier. – roy Aug 06 '17 at 15:25
  • Well, not entirely - I generally don't care about words or characters. When you look carefully, if "F office lights" is the longest common substring between ["ON/OFF office lights", "FB office lights"], your algorithm dropped the 'B' in the second string. All characters in the common substring have to immediately follow each other, there can't be dropped ones.. – Bram Roelandts Aug 06 '17 at 15:28
  • Oh,I see, as I am away from my mac, I will update my code by tommorow 10:30 a.m. GMT+6 – roy Aug 06 '17 at 15:34
  • Thanks a lot for putting in the effort of helping me with this! Curious to see if you'll manage it, as I sure as hell wouldn't know how.. – Bram Roelandts Aug 06 '17 at 15:35