I have an object $Posts which contain a title, and a SimTitles field amongst a few others. I need to compare each title to the other titles and give it a Similarity score in the SimTitles field. So if I have 80 $Posts, it will need to cover 6400 re-iterations as each title needs to be scored vs the others.
Apart from the Measure-TitleSimilarity routine which I believe is optimized, can anyone see a way to improve the speed of this double loop that I am missing?
Edit: I have included the function Measure-TitleSimilarity. I am actually passing the array to the function. The whole topic of quantifying arrays for likeness is fascinating. I have tried with Title.ToCharArray() which changes the magic number to a much higher number. It also can produce a match with two completely different titles as long as the characters are the same. (Ex: 'Mother Teresa' would closely match 'Earthmovers' or 'Thermometer' yet clearly not the same meaning). Cosine Similarity if just one method but it seemed easiest to process. @Mclayton and @bryancook - I see the light with your suggestion, but can't grasp tracking what no longer needs to be looked at for similar words.
Function Get-SimTitles([psobject]$NewPosts) {
$CKTitles = $NewPosts.title
foreach ($Ck in $CkTitles) {
$NewPosts | & {
process {
if ((Measure-TitleSimilarity $Ck.split(' ') $_.title.split(' ') -gt .2) {
$_.SimTitles = $_.SimTitles + 1
}
}
}
}
}
Function Measure-TitleSimilarity
{
## Based on VectorSimilarity by .AUTHOR Lee Holmes
## Modified slightly to match use
[CmdletBinding()]
param(
[Parameter(Position = 0)]
$Title1,
[Parameter(Position = 1)]
$Title2
)
$allkeys = @($Title1) + @($Title2) | Sort-Object -Unique
$set1Hash = @{}
$set2Hash = @{}
$setsToProcess = @($Title1, $Set1Hash), @($Title2, $Set2Hash)
foreach($set in $setsToProcess)
{
$set[0] | Foreach-Object {
$value = 1
$set[1][$_] = $value
}
}
$dot = 0
$mag1 = 0
$mag2 = 0
foreach($key in $allkeys)
{
$dot += $set1Hash[$key] * $set2Hash[$key]
$mag1 += ($set1Hash[$key] * $set1Hash[$key])
$mag2 += ($set2Hash[$key] * $set2Hash[$key])
}
$mag1 = [Math]::Sqrt($mag1)
$mag2 = [Math]::Sqrt($mag2)
return [Math]::Round($dot / ($mag1 * $mag2), 3)
}