0

(This is probably something rather simple I'm missing; but I can't seem to figure it out and haven't found any answers in search)

I need to compare two CSV files with the same columns and output the row differences as follows (final output in Unicode Text):

  • If row exists in FileA but not FileB, label that row "Good"
  • If row exists in FileB but not FileA, label that row "Bad"

Let's say I have the following sample data:

File A:
Column1,Column2,Column3
Tommy,4133,20180204
Suzie,5200,20210112
Tammy,221,20201010

File B:
Column1,Column2,Column3
Tommy,4133,20180204
Nicky,5200,20190520

Here is my current code (borrowing the hash-enabled Compare-Object2 from this site because the delivered Compare-Object is too slow -- FYI, I'm using Get-Content instead of Import-Csv because it's a good 50-times faster since we're comparing entire row. And the MyHeader variable is just to retain the original file's header column values)

Compare-Object2 (Get-Content $FileA) (Get-Content $FileB) -PassThru |
Select-Object @{l=[string]$MyHeader;e={$_.InputObject}},
              @{n='Row Label'; e={ @{'=>' = 'Bad' ; '<=' = 'Good'}[$_.SideIndicator]}},
              @{n='Placeholder'; e={@{'*'='0'}['*']}} |
Sort-Object 'Row Label' -Descending | Export-Csv "$FinalCSV" -NoType;

#Removing " char to create CSV with original and added columns together
Set-Content "$FinalCSV" ((Get-Content "$FinalCSV") -replace '"');

#Convert csv to tab delimited
Import-Csv "$FinalCSV" | Export-Csv "$FinalTXT"  -NoTypeInformation -Delimiter "`t";

#Remove " char and convert to unicode
Set-Content -Encoding UNICODE "$FinalTXT" ((Get-Content "$FinalTXT") -replace '"')

This works perfectly (I know some of it is redundant at the end; but hey: it's the best I could do -- but definitely feel free to fix those parts too!) to create a single output file of the Good and the Bad -- about 40 seconds for two files with 400K rows.

Result File:
Column1 Column2 Column3 Row Label   Placeholder
Suzie   5200    20210112    Good    0
Tammy   221 20201010    Good    0
Nicky   5200    20210112    Bad 0

The problem is, I now need to create them as separate files: one file for the good, one for the bad. So the new needed output would be:

ResultFileGood:
Column1 Column2 Column3 Row Label   Placeholder
Suzie   5200    20210112    Good    0
Tammy   221 20201010    Good    0

ResultFileBad:
Column1 Column2 Column3 Row Label   Placeholder
Nicky   5200    20210112    Bad 0

And I just know there has to be a way to do it without having to run the compare twice - some use of the Where-Object prop or a loop of some sort. I just can't figure it out; so I'm coming to the experts.

Thanks

EDIT: Thanks to postanote, one viable alternative is to just output the combined file and then just split that, which is definitely faster than running the entire compare routine twice. Still would like to see if there's a way to do it directly in the comparison export without the intermediate file; but that's definitely a viable option and what I'm using for now

$FinalHeader = get-content "$FinalTXT" | Select -First 1
$BadOutput = Select-String -Path $FinalTXT -Pattern ('Bad   0')
$GoodOutput = Select-String -Path $FinalTXT -Pattern ('Good 0')
@($FinalHeader,$BadOutput.Line) | Out-File "$FinalBadTXT" -Encoding UNICODE;
@($FinalHeader,$GoodOutput.Line) | Out-File "$FinalGoodTXT" -Encoding UNICODE;
DickieD
  • 3
  • 3
  • Use if/then or try/catch to write to two different log files. Also, making your code purposely readable. You write code for those who follow you and where it will be executed; not for yourself or your workstation. Be kind to others. ***;-}*** – postanote May 20 '21 at 03:01
  • There is probably an even faster way of doing what you're doing without Compare-Object or your custom function, if you just show a few rows on both fileA and fileB. – Santiago Squarzon May 20 '21 at 03:01
  • Thanks guys -- sorry @postanote: since I was only posting an excerpt of the code, I left some of the formatting compacted for space's sake; but I'll be better with formatting in future posts. And I agree @SantiagoSquarzon; and I'm certainly open to other ideas -- I'm admittedly a NON-expert here! I've also added some sample data to the post -- both input and desired output. Also, let me know if it would be more proper to post the Compare-Object2 function (and the variables?) directly in my post instead of just linking (I'm not really sure what is best/proper for the forum) – DickieD May 20 '21 at 14:05

1 Answers1

0

Continuing from my comment.

you've got a lot going on there; i.e., some proxy function, etc.

Mixing these items like you are, you end up with stuff like this... (very simplified of course, and since you are to showing your input you are forcing us to guess to come up with one.)

psEdit -filenames 'D:\temp\book1.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Main0,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>

psEdit -filenames 'D:\temp\book3.txt'
# Results
<#
Site,Dept,Office,Floor
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
Branch3,jjj,kkk,lll
Branch4,mmm,nnn,ooo
#>

Update:

Removing all the previous stuff, since they were not your cup of tea...

;-}

Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt') | 
Export-Csv -Path 'D:\Temp\CompareObject.csv' -NoTypeInformation -Force

(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\<=') -replace '.*CompareObject.*:\"|\"\,.*' | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force

(Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\=>') -replace '.*CompareObject.*:\"|\"\,.*' | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force

$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'

$FileList | 
ForEach-Object {
    "`n********* Getting content $PSItem *********`n"
    Import-Csv -Path  "D:\temp\$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********


Site    Dept Office Floor
----    ---- ------ -----
Main0   aaa  bbb    ccc  

********* Getting content DifferenceObject.csv *********

Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo 
#>

So, as for your last comment:


While that method still uses the intermediary file; I admit I completely wasn't thinking about the simple approach of just exporting the combined file and then just splitting that.***

Ok, then, without using the 'intermediary file'.

($ComparedObjects = Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt'))
# Results
<#
InputObject         SideIndicator
-----------         -------------
Main0,aaa,bbb,ccc   <=           
Branch3,jjj,kkk,lll =>           
Branch4,mmm,nnn,ooo => 
#>

($ComparedObjects -match '<=').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor 
# Results
<#
Site  Dept Office Floor
----  ---- ------ -----
Main0 aaa  bbb    ccc  
#>

($ComparedObjects -match '=>').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor 
# Results
<#
Site    Dept Office Floor
----    ---- ------ -----
Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo 
#>

Then a just export to csv.

($ComparedObjects -match '<=').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force

($ComparedObjects -match '=>').InputObject | 
ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force

Read back as needed

$FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'

$FileList | 
ForEach-Object {
    "`n********* Getting content $PSItem *********`n"
    Import-Csv -Path  "D:\temp\$PSItem"
}
# Results
<#
********* Getting content ReferenceObject.csv *********


Site    Dept Office Floor
----    ---- ------ -----
Main0   aaa  bbb    ccc  

********* Getting content DifferenceObject.csv *********

Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo  
#>

Update

Per your comment --


'the problem is the final output need: the Unicode Tab-delimited text with the additional columns.'


(($ComparedObjects -match '<=').InputObject) -replace ',', "`t" | 
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor  | 
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\ReferenceObject.csv'
# Results
<#
Site  Dept Office Floor
----  ---- ------ -----
Main0 aaa  bbb    ccc  
#>


(($ComparedObjects -match '=>').InputObject) -replace ',', "`t" | 
ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor  | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
Import-Csv -Path 'D:\temp\DifferenceObject.csv'
# Results
<#
Site    Dept Office Floor
----    ---- ------ -----
Branch3 jjj  kkk    lll  
Branch4 mmm  nnn    ooo  
#>

Or for the extra column stuff, you could do this...

$ComparedObjects -match '<=' | 
Select-Object -Property @{
    Name       = 'Site'
    Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
    Name       = 'Dept'
    Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
    Name       = 'Office'
    Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
    Name       = 'Floor'
    Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
    Name       = 'Label'
    Expression = {'Good'}
}, 
@{
    Name       = 'Placeholder'
    Expression = {0}
} |  
Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\ReferenceObject.csv') -replace '"','' -replace ',', "`t" | 
Set-Content -PassThru 'D:\temp\ReferenceObject.csv'
Import-Csv -Path 'D:\temp\ReferenceObject.csv' -Delimiter "`t" | 
Format-Table -AutoSize
# Results
<#
Site  Dept Office Floor Label Placeholder
----  ---- ------ ----- ----- -----------
Main0 aaa  bbb    ccc   Good  0 
#>


$ComparedObjects -match '=>' | 
Select-Object -Property @{
    Name       = 'Site'
    Expression = {($PSItem.InputObject -split ',')[0]}
},
@{
    Name       = 'Dept'
    Expression = {($PSItem.InputObject -split ',')[1]}
},
@{
    Name       = 'Office'
    Expression = {($PSItem.InputObject -split ',')[2]}
},
@{
    Name       = 'Floor'
    Expression = {($PSItem.InputObject -split ',')[3]}
},
@{
    Name       = 'Label'
    Expression = {'Good'}
}, 
@{
    Name       = 'Placeholder'
    Expression = {0}
} | 
Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
(Get-Content -Path 'D:\temp\DifferenceObject.csv') -replace '"','' -replace ',', "`t" | 
Set-Content -PassThru 'D:\temp\DifferenceObject.csv'
Import-Csv -Path 'D:\temp\DifferenceObject.csv' -Delimiter "`t" | 
Format-Table -AutoSize
# Results
<#
Site    Dept Office Floor Label Placeholder
----    ---- ------ ----- ----- -----------
Branch3 jjj  kkk    lll   Good  0          
Branch4 mmm  nnn    ooo   Good  0 
#>
postanote
  • 15,138
  • 2
  • 14
  • 25
  • Thanks postanote! I updated the response with sample data (input and output) - and thanks to Santiago for reformatting (I had only posted an excerpt; so I removed formatting trying to save space! I know better for future posts) I did try your method (still using the hash-enabled Compare-Object2 instead of built-in - let me know if it's better to add the full function code in the post instead of just the link); but the incremental for-each line by line method takes much longer when there are many differences. (But that was probably my fault in not explaining with better details at first) – DickieD May 20 '21 at 19:07
  • No worries. As for `let me know if it's better to add the full function code in the post instead of just the link`. This is really a size dependant thing, and if it is your own code, sure. The SO rules state use an [MRE](https://stackoverflow.com/help/minimal-reproducible-example). As for `I did try your method (still using the hash-enabled Compare-Object2` . Did that get you what you needed? – postanote May 21 '21 at 04:04
  • Thanks. No I'm using the function code verbatim, which was why I only posted the link. So I'll keep the reference to the link instead. Unfortunately no your method didn't help with my "need for speed": while yes it accomplishes creating the two separate files, it took over 10X longer (5000 differences in my two files: 11 minutes and 6 minutes respectively using your for-each line by line export versus about 40 seconds each with my current code). If there were a way to combine the two it'd be perfect! – DickieD May 21 '21 at 13:38
  • See my update for you... regarding speed and a different approach. – postanote May 22 '21 at 09:15
  • Thanks! While that method still uses the intermediary file; I admit I completely wasn't thinking about the simple approach of just exporting the combined file and *then* just splitting that. That's definitely faster than running the whole compare twice (fyi, I just used -ExpandProperty Line instead of the replace); so your answer is definitely worth at least an upvote now - but my profile is still too "new" to upvote :(. Not marking as the answer yet because I'd still like to see if there's a way to do it without the in-between export; but if no others come, I'll mark this as the answer later – DickieD May 23 '21 at 17:01
  • No worries, it's good that it was an edification moment for you though. ;-} Yet, you don't really need the intermediary file at all. See my update for you. – postanote May 24 '21 at 23:53
  • Thanks. Eh... the problem is the final output need: the Unicode Tab-delimited text with the additional columns. For now, I just appended the Select-String to my original consolidated file code to split the file -- only takes a few seconds that way anyway. I updated the post with the added code -- definitely open to suggestions to clean that up (I know some of it is redundant; but it works) – DickieD May 25 '21 at 00:11
  • As for this... ***'the problem is the final output need: the Unicode Tab-delimited text with the additional columns.'***, then just do that, Export-Csv has those features. See my update for you. – postanote May 25 '21 at 00:26
  • Awesome! Thanks for that postanote. That update does exactly what I was looking for. Ironically, for whatever reason, the intermediary file and string split method actually ran a few seconds *faster* than the non (I actually tried it a few times just to make sure it wasn't a system issue or something); but I have to admit your method is MUCH cleaner and more flexible – DickieD May 25 '21 at 18:32
  • No worries. Good to know it helped. – postanote May 26 '21 at 00:11