It's not really clear whether the string has to be considered divided in words by separators it may contains, or if you're just looking for a specific substring occurrence.
Anyway both cases could be addressed in this way in my opinion:
extension String {
func enumerateOccurencies(of pattern: String, _ body: (Range<String.Index>, inout Bool) throws -> Void) rethrows {
guard
!pattern.isEmpty,
count >= pattern.count
else { return }
var stop = false
var lo = startIndex
while !stop && lo < endIndex {
guard
let r = self[lo..<endIndex].range(of: pattern)
else { break }
try body(r, &stop)
lo = r.upperBound
}
}
}
You'll then set stop
to true in the body
closure once reached the desired occurrence number and capture the range
passed to it:
let words = "word1, word1, word2, word3, word1, word3"
var matches = 0
var rangeOfThirdOccurencyOfWord1: Range<String.Index>? = nil
words.enumerateOccurencies(of: "word1") { range, stop in
matches +=1
stop = matches == 3
if stop {
rangeOfThirdOccurencyOfWord1 = range
}
}
Regarding the DFA: recently I've wrote one leveraging on Hashable and using a an Array of Dictionaries as its state nodes, but I've found that the method above is faster, cause maybe range(of:)
uses finger-printing.
UPDATE
Otherwise you could also achieve that API you've mentioned in this way:
import Foundation
extension String {
func rangeOfWord(order: Int, separator: String) -> Range<String.Index>? {
precondition(order > 0)
guard
!isEmpty,
!separator.isEmpty,
separator.count < count
else { return nil }
var wordsSoFar = 0
var lo = startIndex
while let r = self[lo..<endIndex].range(of: separator) {
guard
r.lowerBound != lo
else {
lo = r.upperBound
continue
}
wordsSoFar += 1
guard
wordsSoFar < order
else { return lo..<r.lowerBound }
lo = r.upperBound
}
if
lo < endIndex,
wordsSoFar + 1 == order
{
return lo..<endIndex
}
return nil
}
}
let words = "word anotherWord oneMore lastOne"
if let r = words.rangeOfWord(order: 4, separator: " ") {
print(words[r])
} else {
print("not found")
}
Here order
parameter refers to the nth order of the word in the string, starting from 1. I've also added the separator
parameter to specify a string token to use for finding words in the string (it can also be defaulted to " "
to be able to call the function without having to specify it).