0

I need to compare 2 CSV files, and output the differences contained in the larger one, that are not present in smaller one based on 2 headers (SiteName, PrimarySMTPAddress).

At present, i can do this using the below:

#Importing CSV
$File1 = Import-Csv -Path "C:\Data1-Large.csv"

#Importing CSV 
$File2 = Import-Csv -Path "Data2.csv"

#Compare both CSV files - column PrimarySMTPAddress
$Results = Compare-Object  $File1 $File2 -Property PrimarySMTPAddress -IncludeEqual

$Array = @()       
Foreach($R in $Results)
{
    If( $R.sideindicator -eq "==" )
    {
        $Object = [pscustomobject][ordered] @{

            PrimarySMTPAddress = $R.PrimarySMTPAddress
            "Compare indicator" = $R.sideindicator

        }
        $Array += $Object
    }
}

#Count users in both files
($Array | sort-object username | Select-Object * -Unique).count

#Display results in console
$Array

The Problem with the above is - It will show the differences between the large CSV and the small CSV - it will spit out ALL the Primary SMTP Addresses in the large CSV.

I need to get the output to filter on the 2nd Column SiteName, so that only SiteNames that match the other CSV are enumerated (the large CSV contains 12000 ROWS vs 2500 in small one). Filtering with Compare-Object isn't the Same, i can't see how to use Where-Object to filter down as i would normally.

I'm new to Arrays so, would appreciate any help.

Thanks in advance.

Royston
  • 433
  • 2
  • 9
  • 25
  • When you use `gc`, you don’t get any properties with references to your data. You should use `import-CSV` instead. If your only intention is to keep differences, then using `-includeequal` just adds extra, unnecessary data. – AdminOfThings Feb 27 '20 at 09:47
  • Thanks, My intention is to show the differences only so i can then add them into the reference object – Royston Feb 27 '20 at 09:51
  • `compare-object` can work with arrays for its reference and difference objects. There is no need to loop to extract differences. – AdminOfThings Feb 27 '20 at 09:52
  • What i'm trying to do isn't that straightforward. I'm taking 2 data sets and then attempting to enumerate through each subset of data in parallel. – Royston Feb 27 '20 at 10:03
  • Ok. But why? If it is just for academic purposes, then that’s fine. But what you are doing with your code does not require it. – AdminOfThings Feb 27 '20 at 10:10
  • Based on your reconstructed question, am I correct in assuming the following: 1) The large CSV contains multiple, different sitename values. 2) The small CSV contains one sitename value that is repeated on every line. 3) You want to compare primarysmtpaddress values in each CSV where the sitename values match across both files. – AdminOfThings Feb 27 '20 at 11:31
  • Yes, all correct as above except 2) The small CSV contains a small subset of sitename values. I would like to output the missing smtp addresses per site from the larger dataset not contained within the smaller set. Thank you. – Royston Feb 27 '20 at 11:33
  • so why won't `Compare-Object $File1 $File2 -Property PrimarySMTPAddress,SiteName -IncludeEqual -Passthru` work here? Then it will find uniqueness based on both properties. – AdminOfThings Feb 27 '20 at 11:40
  • Ah, i forgot to add a key point (i was hoping to fix that at some point at the end), the SiteName Data is not the same between the 2 CSV's i'd need to use a wildcard or substring(0,7) to make that work as the first 0-7 text characters mostly match). Am looking into it now, any ideas greatly welcomed. – Royston Feb 27 '20 at 11:47
  • Then you can just use `Compare-Object $File1 $File2 -Property PrimarySMTPAddress,{$_.SiteName.Substring(0,7)} -Passthru`. – AdminOfThings Feb 27 '20 at 12:30

1 Answers1

0

You can do the following provided you have real CSV files with column headers.

$DataSet1 = Import-Csv C:\provisioningdata\Provision-P3-D
$DataSet2 = Import-Csv C:\provisioningdata\Provision-P3
Compare-Object $DataSet1 $DataSet2 -Property PrimarySmtpAddress | 
     Where SideIndicator -eq '=>' |
         Select-Object @{n='Username';e={$_.PrimarySmtpAddress}}

Regarding parallel traversal through two arrays, using a for loop is easiest. However, you are really relying on the order of your data being predictable.

for ($i = 0; $i -lt $DataSet2.Count; $i++) {
    Compare-Object $DataSet1[$i] $DataSet2[$i] -Property PrimarySmtpAddress | 
         Where SideIndicator -eq '=>' |
             Select-Object @{n='Username';e={$_.PrimarySmtpAddress}}
}

EDIT

I am not sure why there is a need for traversing by site when Compare-Object supports multiple properties and expressions, but you can do the following:

$File1 = Import-Csv -Path "C:\Data1-Large.csv"
$File2 = Import-Csv -Path "Data2.csv"

$SiteGroup1 = $File1 | Group-Object -Property {$_.SiteName -replace '(?<=^.{7}).*'}
$SiteGroup2 = $File2 | Group-Object -Property {$_.SiteName -replace '(?<=^.{7}).*'}

$Results = foreach ($site in $SiteGroup2) {
    Compare-Object $SiteGroup1.Where{$_.Name -eq $site.Name}.Group $site.Group -Property PrimarySmtpAddress
}

Using Compare-Object with multiple properties and expressions:

Compare-Object $File1 $File2 -Property PrimarySMTPAddress,{$_.SiteName -replace '(?<=^.{7}).*'}
AdminOfThings
  • 23,946
  • 4
  • 17
  • 27
  • Thanks, but i'm not sure this is what will help - this doesn't account for the fact that within each dataset there's an individual CSV file. I also have a large amount of data in the reference set, so, there will be hundreds of smtp address spat out that are irrelevant – Royston Feb 27 '20 at 10:22
  • I think you need to provide samples of what your CSV files contain and then state what your end result should be. You are making it sound like your only issue is reading the CSV data. Your code above is doing nothing regarding reading multi-depth CSV data. – AdminOfThings Feb 27 '20 at 10:26
  • Thank you for your help, i've changed the original question - to try a more simple meithod with all the data in both CSV's clumped together (i tried to split it out by SiteName) – Royston Feb 27 '20 at 11:18
  • @royston, I added an edit that may help some. Let me know if that gets you closer. – AdminOfThings Feb 27 '20 at 12:45
  • Thanks a lot for your help, i think the data is inconsistent so, not much we could do. – Royston Feb 27 '20 at 14:48