1

Given a string of arbitrary length. I need to find 1 subsequences of identical characters that go in a row.

My function (there are two of them, but these are two parts of the same function) turned out to be complex and cumbersome and did not fit because of this. The function I need should be simple and not too long.

Example:

Input : str = "abcabc"
Output : abc

Input : str = "aa"
Output : a

Input : str = "abcbabcb"
Output : abcb

Input : str = "abcbca"
Output : bcbc

Input : str = "cbabc"
Output : 

Input : str = "acbabc"
Output :

My unsuccessful function:

func findRepetition(_ p: String) -> [String:Int] {
    var repDict: [String:Int] = [:]
    var p = p
    while p.count != 0 {
        for i in 0...p.count-1 {
            repDict[String(Array(p)[0..<i]), default: 0] += 1
        }
        p = String(p.dropFirst())
    }
    return repDict
}

var correctWords = [String]()
var wrongWords = [String]()
func getRepeats(_ p: String) -> Bool {
    let p = p
    var a = findRepetition(p)
    for i in a {
        var substring = String(Array(repeating: i.key, count: 2).joined())
        if p.contains(substring) {
            wrongWords.append(p)
            return false
        }
    }
    correctWords.append(p)
    return true
}

I will be very grateful for your help!

Coffee inTime
  • 231
  • 1
  • 8

3 Answers3

1

Here's a solution using regular expression. I used a capture group that tries to match as many characters as possible such that the whole group repeats at least once.

import Foundation

func findRepetition(_ s: String) -> String? {
    if s.isEmpty { return nil }
    let pattern = "([a-z]+)\\1+"
    let regex = try? NSRegularExpression(pattern: pattern, options: [])
    if let match = regex?.firstMatch(in: s, options: [], range: 
NSRange(location: 0, length: s.utf16.count)) {
        let unitRange = match.range(at: 1)
        return (s as NSString).substring(with: unitRange)
    }
    return nil
}

print(findRepetition("abcabc")) //prints abc
print(findRepetition("aa")) //prints a
print(findRepetition("abcbabcb")) //prints abcb
print(findRepetition("abcbca")) //prints bc
print(findRepetition("cbabc")) //prints nil
print(findRepetition("acbabc")) //prints nil
Hong Wei
  • 1,397
  • 1
  • 11
  • 16
0
func findRepetitions(_ p : String) -> [String: Int]{
    let half = p.count / 2 + 1
    var result : [String : Int] = [:]
    for i in 1..<half {
        for j in 0...(p.count-i) {
            let sub = (p as! NSString).substring(with: NSRange.init(location: j, length: i))
            if let val = result[sub] {
                result[sub] = val + 1
            }else {
                result[sub] = 1
            }
        }
    }
    return result
}

This is for finding repetitions of possible substrings in your string. Hope it can help

  • Hello, thanks for the feedback. In the case of the abcabc line, I get ["bc": 2, "abc": 2, "a": 2, "bca": 1, "cab": 1, "ab": 2, "ca": 1, "c": 2, "b": 2], but how do I understand which of this comes in a row? (In my case, the function should return only "abc") – Coffee inTime Aug 06 '19 at 10:49
  • 1
    the longest and most repeated substring is your need – Cao Khắc Lê Duy Aug 06 '19 at 10:53
  • In the example with the string "abcabc", the function returns "bc", "abc", "a" = 2. – Coffee inTime Aug 06 '19 at 10:56
  • 1
    "abc" is the longest :) – Cao Khắc Lê Duy Aug 06 '19 at 10:57
  • Yes, but I do not need the longest sequence. I need to determine if there are two identical subsequences in a row in a row. abcababc = ab. abcabc = abc. ababcb = ab. aa = a. – Coffee inTime Aug 06 '19 at 10:59
  • 1
    omg, that dictionary give you the repetitions already, you can detect which one has 2 identical subsequences based on those number of count @@ – Cao Khắc Lê Duy Aug 06 '19 at 11:02
  • Sorry, I have already implemented this and it does not fit. I do not need the same function. The correct function should compare each character with the next. Then two characters with the next two and so on. – Coffee inTime Aug 06 '19 at 17:38
0

Here is a solution that is based on the Suffix Array Algorithm, that finds the longest substring that is repeated (contiguously):

func longestRepeatedSubstring(_ str: String) -> String {

    let sortedSuffixIndices = str.indices.sorted { str[$0...] < str[$1...] }
    let lcsArray = [0]
        +
        sortedSuffixIndices.indices.dropFirst().map { index in
            let suffix1 = str[sortedSuffixIndices[index]...]
            let suffix2 = str[sortedSuffixIndices[index - 1]...]
            let commonPrefix = suffix1.commonPrefix(with: suffix2)
            let count = commonPrefix.count
            let repeated = suffix1.dropFirst(count).commonPrefix(with: commonPrefix)
            return count == repeated.count ? count : 0
    }

    let maxRepeated = zip(sortedSuffixIndices.indices,lcsArray).max(by: { $0.1 < $1.1 })

    if let tuple = maxRepeated, tuple.1 != 0 {
        let suffix1 = str[sortedSuffixIndices[tuple.0 - 1]...]
        let suffix2 = str[sortedSuffixIndices[tuple.0]...]
        let longestRepeatedSubstring = suffix1.commonPrefix(with: suffix2)
        return longestRepeatedSubstring
    } else {
        return ""
    }
}

Here is an easy to understand tutorial about such an algorithm.

It works for these examples:

longestRepeatedSubstring("abcabc")    //"abc"
longestRepeatedSubstring("aa")        //"a"
longestRepeatedSubstring("abcbabcb")  //"abcd"
longestRepeatedSubstring("abcbca")    //"bcbc"
longestRepeatedSubstring("cbabc")     //""
longestRepeatedSubstring("acbabc")    //""

As well as these:

longestRepeatedSubstring("acac")    //"ac"
longestRepeatedSubstring("Ab cdAb cd")  //"Ab cd"
longestRepeatedSubstring("aabcbc")      //"bc"

Benchmarks

Here is a benchmark that clearly shows that the Suffix Array algorithm is much faster than using a regular expression.

The result is:

Regular expression: 7.2 ms
Suffix Array      : 0.1 ms
ielyamani
  • 17,807
  • 10
  • 55
  • 90