3

I have a question about using Linq in PowerShell. I can not figure out how to correctly use the Except method

Example tables:

$Arr = 1..1000
$Props = ("employeeID","FindName1","FindName2")
$Table1 = New-Object System.Data.DataTable "Table1"
$Props | ForEach-Object { $Table1.Columns.Add( $_ , [String]) | Out-Null }

ForEach ($Record in $Arr ) {
    $Row = $Table1.NewRow()
    $Row.employeeID = $Record.ToString("00000")
    $Row.FindName1 = "UserName_" + $Record.ToString()
    $Row.FindName2 = "String_" + $Record.ToString("00000000")
    $Table1.Rows.Add($Row)
}

$Arr2 = 980..1111
$Props = ("employeeID","FindName1")
$Table2 = New-Object System.Data.DataTable "Table2"
$Props | ForEach-Object { $Table2.Columns.Add( $_ , [String]) | Out-Null }

ForEach ($Record in $Arr2 ) {
    $Row = $Table2.NewRow()
    $Row.employeeID = $Record.ToString("00000")
    $Row.FindName1 = "UserName_" + $Record.ToString()
    $Table2.Rows.Add($Row)
}

As a result of the work, I want to get records from the $table1 where FindName1 not in $Table2.FindName1, preserving all the headers

An attempt to perform does not produce the expected result.

$ExceptOut = [System.Linq.Enumerable]::Except($Table1.FindName1, $Table2.FindName1)

As I understood from the article , i need to create my own class with methods that allow me to use a LINQ in the tables. But I am extremely far from programming. Or maybe there is some other fast analogue of "NOT IN" in SQL. I hope for help. Thanks.

2 Answers2

5

For the (generic) .Except() LINQ method to work, the two enumerables (IEnumerable<T>) passed as arguments must:

  • enumerate instances of the same type T
  • and, if that type is a reference type whose instance should compare meaningfully based on the content of instances (rather than by mere reference equality, i.e. identity), must implement the IEquatable<T> interface and/or override the .Equals() method.

PowerShell is seemingly not able to find the right overload for .Except() with the [object[]] arrays returned by $Table1.FindName1 and $Table2.FindName1, though these arrays technically fulfill the above requirements - I don't know why.

However, simply casting these arrays to what they already are - [object[]] - solves the problem:

[Linq.Enumerable]::Except([object[]] $Table1.FindName1, [object[]] $Table2.FindName1)

Given that the .FindName1 column ultimately contains strings, you can also cast to [string[]], though this implicitly creates a copy of each array, which is unnecessary here.


Now if you want to return whole rows while using the .FindName1 column only for comparison, things get much more complex:

  • You must implement a custom comparer class that implements the IEqualityComparer[T]interface.

  • You must cast the .Rows collection of the data tables to IEnumerable[DataRow], which requires calling the System.Linq.Enumerable.Cast() method via reflection.

    • Note: While you could directly cast to [DataRow[]], this would involve inefficient conversion of the rows collection to an array.

Here's a PSv5+ solution that implements the custom comparer class as a PowerShell class:

# A custom comparer class that compares two DataRow instances by their
# .FindName1 column.
class CustomTableComparer : Collections.Generic.IEqualityComparer[Data.DataRow] {
  [bool] Equals([Data.DataRow] $x, [Data.DataRow] $y) {
    return [string]::Equals($x.FindName1, $y.FindName1, 'Ordinal')
  }
  [int] GetHashCode([Data.DataRow] $row) {
    # Note: Any two rows for which Equals() returns $true must return the same
    #       hash code. Because *ordinal, case-sensitive* string comparison is
    #       used above, it's sufficient to simply call .GetHashCode() on
    #       the .FindName1 property value, but that would have to be tweaked
    #       for other types of string comparisons.
    return $row.FindName1.GetHashCode();
  }
}


# Use reflection to get a reference to a .Cast() method instantiation 
# that casts to IEnumerable<DataRow>.
$toIEnumerable = [Linq.Enumerable].GetMethod('Cast').MakeGenericMethod([Data.DataRow])

# Call .Except() with the casts and the custom comparer.
# Note the need to wrap the .Rows value in an aux. single-element
# array - (, ...) - for it to be treated as a single argument.
[Linq.Enumerable]::Except(
    $toIEnumerable.Invoke($null, (, $Table1.Rows)), 
    $toIEnumerable.Invoke($null, (, $Table2.Rows)), 
    [CustomTableComparer]::new()
)

GitHub issue #2226 proposes making LINQ a first-class PowerShell citizen.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Thank you very much! It's like what I was looking for =) `Compare-Object` method from second message is too slow for my task – Vasiliy Pupkin Mar 04 '19 at 07:50
  • Glad to hear it was helpful, @VasiliyPupkin, Yeah, I'm not surprised that `Compare-Object` is too slow. – mklement0 Mar 04 '19 at 08:49
1

To complement the LINQ-based answer with a native PowerShell solution:

The Compare-Object cmdlet allows you to compare collections, but note that while it is more concise, it is also much slower than the LINQ-based solution:

Compare-Object -PassThru -Property FindName1 `
  ([Data.DataRow[]] $Table1.Rows) `
  ([Data.DataRow[]] $Table2.Rows) | Where-Object SideIndicator -eq '<='
  • Casting [Data.DataRow[]] - which creates a new array from the rows collection - is seemingly needed for Compare-Object to recognize the rows as an enumerable.

    • Calling .GetEnumerator() or casting to Collections.IEnumerable doesn't help, and casting to Collections.Generic.IEnumerable[Data.DataRow]] fails.
  • -Property FindName1 specifies the comparison property, i.e., the property to compare the rows by.

  • -PassThru is needed to make Compare-Object output the input objects as-is, instead of custom objects that contain only the property/ies specified with -Property.

    • Note that the objects are decorated with a .SideIndicator NoteProperty member, however, using PowerShell's ETS (extended type system) - see below.
  • Given that Compare-Object outputs input objects that are unique to either collection, Where-Object SideIndicator -eq '<=' must be used to limit the results to those difference objects that are unique to the LHS input collection (which is signaled via a .SideIndicator property value of '<=' - the arrow points to the side the object is unique to).

This GitHub issue proposes a number of improvements to the Compare-Object cmdlet, which could help simplify and speed up the above solution.
That said, the proposal to make LINQ a first-class PowerShell citizen holds much more promise.

mklement0
  • 382,024
  • 64
  • 607
  • 775