I'm currently working on a PowerShell script that is going to be used in TeamCity as part of a build step. The script has to:
- recursively check all files with a certain extension (.item) within a folder,
- read the third line of each file (which contains a GUID) and check if there are any duplicates in these lines,
- log the path of the file that contains the duplicate GUID and log the GUID itself,
- make the TeamCity build fail if one or more duplicates are found
I am completely new to PowerShell scripts, but so far I've made something that does what I expect it to do:
Write-Host "Start checking for Unicorn serialization errors."
$files = get-childitem "%system.teamcity.build.workingDir%\Sitecore\serialization" -recurse -include *.item | where {! $_.PSIsContainer} | % { $_.FullName }
$arrayOfItemIds = @()
$NrOfFiles = $files.Length
[bool] $FoundDuplicates = 0
Write-Host "There are $NrOfFiles Unicorn item files to check."
foreach ($file in $files)
{
$thirdLineOfFile = (Get-Content $file)[2 .. 2]
if ($arrayOfItemIds -contains $thirdLineOfFile)
{
$FoundDuplicates = 1
$itemId = $thirdLineOfFile.Split(":")[1].Trim()
Write-Host "Duplicate item ID found!"
Write-Host "Item file path: $file"
Write-Host "Detected duplicate ID: $itemId"
Write-Host "-------------"
Write-Host ""
}
else
{
$arrayOfItemIds += $thirdLineOfFile
}
}
if ($foundDuplicates)
{
"##teamcity[buildStatus status='FAILURE' text='One or more duplicate ID's were detected in Sitecore serialised items. Check the build log to see which files and ID's are involved.']"
exit 1
}
Write-Host "End script checking for Unicorn serialization errors."
The problem is: it's very slow! The folder that has to be checked by this script currently contains over 14.000 .item-files and it's very likely that that amount will only keep increasing in the future. I understand that opening and reading so many files is an extensive operation, but I didn't expect it to take approximately half an hour to complete. This is way too long, because it would mean the build time for every (snapshot) build would be lengthened by half an hour, which is unacceptable. I had hoped the script would complete in a couple of minutes at max.
I can't possibly believe that there isn't a faster approach to do this.. so any help in this area is greatly appreciated!
Solution
Well I have to say that all 3 answers I received so far have helped me out in this one. I first started with using the .NET framework classes directly and then used the dictionary as well to solve the growing array problem. The time it took to run my own script was about 30 minutes, then that went down to just 2 minutes by using the .NET framework classes. After using the Dictionary solution as well it went down to just 6 or 7 seconds! The final script that I use:
Write-Host "Start checking for Unicorn serialization errors."
[String[]] $allFilePaths = [System.IO.Directory]::GetFiles("%system.teamcity.build.workingDir%\Sitecore\serialization", "*.item", "AllDirectories")
$IdsProcessed = New-Object 'system.collections.generic.dictionary[string,string]'
[bool] $FoundDuplicates = 0
$NrOfFiles = $allFilePaths.Length
Write-Host "There are $NrOfFiles Unicorn item files to check."
Write-Host ""
foreach ($filePath in $allFilePaths)
{
[System.IO.StreamReader] $sr = [System.IO.File]::OpenText($filePath)
$unused1 = $sr.ReadLine() #read the first unused line
$unused2 = $sr.ReadLine() #read the second unused line
[string]$thirdLineOfFile = $sr.ReadLine()
$sr.Close()
if ($IdsProcessed.ContainsKey($thirdLineOfFile))
{
$FoundDuplicates = 1
$itemId = $thirdLineOfFile.Split(":")[1].Trim()
$otherFileWithSameId = $IdsProcessed[$thirdLineOfFile]
Write-Host "---------------"
Write-Host "Duplicate item ID found!"
Write-Host "Detected duplicate ID: $itemId"
Write-Host "Item file path 1: $filePath"
Write-Host "Item file path 2: $otherFileWithSameId"
Write-Host "---------------"
Write-Host ""
}
else
{
$IdsProcessed.Add($thirdLineOfFile, $filePath)
}
}
if ($foundDuplicates)
{
"##teamcity[buildStatus status='FAILURE' text='One or more duplicate ID|'s were detected in Sitecore serialised items. Check the build log to see which files and ID|'s are involved.']"
exit 1
}
Write-Host "End script checking for Unicorn serialization errors. No duplicate ID's were found."
So thanks to all!