Selecting Distinct Items within Array using PowerShell and Linq

Question

I have been banging my head on this problem for a few hours.

I have a multi-dimensional array and I need to select the unique items based on two "columns".

Is there an efficient .Net or otherwise way to do this and achieve the desired output?

The data looks something like:

ComputerName, IPAddress, MacAddress, FirstObserved

I would like to determine unique values based on MacAddress and ComputerName and keep the unique value based on the oldest FirstObserved date.

I have tried the PowerShell ways of doing this but it's horribly slow to say the least.

$data | Group-Object -Property ComputerName,MacAddress | ForEach-Object{$_.Group | Sort-Object -Property FirstObserved | Select-Object -First 1}

In a perfect world I would have a list of items no duplicates with the oldest entry based on FirstObserved date.

The data looks similar to: computer1, 192.168.10.1, 33:44:55 66,10-20-2012 sorry I am mobile at the moment. Comes from a CSV. There is anywhere from 100k to 1million items. I will post more when I get to a computer. Thanks again for the response. — BDubs, Jun 02 '19 at 17:14
Same issue as with your last question, to sort by a string with a date in `dd-MM-yyyy` format, first convert it to a [datetime]- — , Jun 02 '19 at 17:20
I am able to sort it's just horribly slow via the typical Powershell methods. Was looking for ideas on ways to speed it up. — BDubs, Jun 02 '19 at 17:30
Are you saying the array looks like this `@(@("computer1","192.168.1.1", "33:44:55 66", "10-10-2012"), @("computer2","192.168.1.2", "33:44:55 66", "10-11-2012"))` — ArcSet, Jun 02 '19 at 18:36
If it _comes from a csv_, then why bother with this array at all? Can't you simply import the csv and take it from there? — Theo, Jun 02 '19 at 19:38
Thanks all. So the CSV is part of the data, and the other half of the data comes from an external system and is already loaded in memory. The idea was to bring in the CSV and deduplicate the information with the loaded data, and/or save out the loaded data and dedupe on the other side (two CSV's, two JSON's...etc). — BDubs, Jun 02 '19 at 20:23

score 2 · Answer 1 · answered Jun 02 '19 at 20:38

You can implement the grouping manually with a hashtable:

$FirstObserved = @{}
Import-Csv .\data.csv |ForEach-Object {
    $key  = $_.Computername,$_.MacAddress -join ','
    $date = $_.FirstObserved -as [datetime]

    # Check if we already have an entry for this Name + MAC combo
    if($FirstObserved.Contains($key))
    {
        # Make sure the current date is older than what we have already
        if($FirstObserved[$key] -gt $date)
        {
            $FirstObserved[$key] = $date
        }
    }
    else
    {
        # First instance of the Name + MAC combo, save the date
        $FirstObserved[$key] = $date
    }
}

Now you can eaily look up the first date for a combination with:

$FirstObserved["Computer-01,11:22:33:44:55:66"]

If you want to export the list to another csv, all you need to do is turn the key-value pairs of the hashtable into objects:

$FirstObserved |Select-Object @{Name='Identity';Expression={$_.Key}},@{Name='FirstObserved';Expression={$_.Value}} |Export-Csv -Path output.csv

Thanks for the reply. That works great but I end up dropping the other values. The data before the "dedup" process contains ComputerName, IPAddress, MacAddress, and FirstObserved. I would like to have the results contain that information just duplicates removed. Again thanks for your time today it is greatly appreciated. — BDubs, Jun 02 '19 at 21:11
@BDubs easy, just store `$_` inside the hashtable and compare `$date -le $FirstObserved[$key].FirstObserved` instead, then export `$FirstObserved.Values` — Mathias R. Jessen, Jun 02 '19 at 21:35

Selecting Distinct Items within Array using PowerShell and Linq

1 Answers1