A good way to measure the similarity of 2 arrays is to iterate all elements of an array, and keep a cursor on the 2nd array, such that at any time the current element of the iterated array is not greater than the element at the cursor position.
As you may argue, this algorithm require elements to be comparable, and as such it works if the arrays type implements the Comparable
interface.
I've worked on a generic function that perform that calculation, here it is:
func compare<T: Comparable>(var lhs: [T], var rhs: [T]) -> (matches: Int, total: Int) {
lhs.sort { $0 < $1 } // Inline sort
rhs.sort { $0 < $1 } // Inline sort
var matches = 0
var rightSequence = SequenceOf(rhs).generate()
var right = rightSequence.next()
for left in lhs {
while right != nil && left > right {
right = rightSequence.next()
}
if left == right {
++matches
right = rightSequence.next()
}
}
return (matches: matches, total: max(lhs.count, rhs.count))
}
Let me say that the implementation can probably be optimized, but my goal here is to show the algorithm, not to provide its best implementation.
The first thing to do is to obtain a sorted version of each of the 2 arrays - for simplicity, I have declared both parameters as var
, which allows me to edit them, leaving all changes in the local scope. That's way I am using in-place sort.
A sequence on the 2nd array is created, called rightSequence
, and the first element is extracted, copied into the right
variable.
Then the first array is iterated over - for each element, the sequence is advanced to the next element until the left element is not greater than the right one.
Once this is done, left and right are compared for equality, in which case the counter of matches is incremented.
The algorithm works for arrays having repetitions, different sizes, etc.