0

I have been banging my head on this problem for a few hours.

I have a multi-dimensional array and I need to select the unique items based on two "columns".

Is there an efficient .Net or otherwise way to do this and achieve the desired output?

The data looks something like:

ComputerName, IPAddress, MacAddress, FirstObserved

I would like to determine unique values based on MacAddress and ComputerName and keep the unique value based on the oldest FirstObserved date.

I have tried the PowerShell ways of doing this but it's horribly slow to say the least.

$data | Group-Object -Property ComputerName,MacAddress | ForEach-Object{$_.Group | Sort-Object -Property FirstObserved | Select-Object -First 1}

In a perfect world I would have a list of items no duplicates with the oldest entry based on FirstObserved date.

BDubs
  • 73
  • 1
  • 14
  • 1
    can you show me a example of what the array looks like? – ArcSet Jun 02 '19 at 16:43
  • Thanks for the quick response this is driving me nuts haha. – BDubs Jun 02 '19 at 17:10
  • The data looks similar to: computer1, 192.168.10.1, 33:44:55 66,10-20-2012 sorry I am mobile at the moment. Comes from a CSV. There is anywhere from 100k to 1million items. I will post more when I get to a computer. Thanks again for the response. – BDubs Jun 02 '19 at 17:14
  • Same issue as with your last question, to sort by a string with a date in `dd-MM-yyyy` format, first convert it to a [datetime]- –  Jun 02 '19 at 17:20
  • I am able to sort it's just horribly slow via the typical Powershell methods. Was looking for ideas on ways to speed it up. – BDubs Jun 02 '19 at 17:30
  • Are you saying the array looks like this `@(@("computer1","192.168.1.1", "33:44:55 66", "10-10-2012"), @("computer2","192.168.1.2", "33:44:55 66", "10-11-2012"))` – ArcSet Jun 02 '19 at 18:36
  • If it _comes from a csv_, then why bother with this array at all? Can't you simply import the csv and take it from there? – Theo Jun 02 '19 at 19:38
  • Thanks all. So the CSV is part of the data, and the other half of the data comes from an external system and is already loaded in memory. The idea was to bring in the CSV and deduplicate the information with the loaded data, and/or save out the loaded data and dedupe on the other side (two CSV's, two JSON's...etc). – BDubs Jun 02 '19 at 20:23

1 Answers1

2

You can implement the grouping manually with a hashtable:

$FirstObserved = @{}
Import-Csv .\data.csv |ForEach-Object {
    $key  = $_.Computername,$_.MacAddress -join ','
    $date = $_.FirstObserved -as [datetime]

    # Check if we already have an entry for this Name + MAC combo
    if($FirstObserved.Contains($key))
    {
        # Make sure the current date is older than what we have already
        if($FirstObserved[$key] -gt $date)
        {
            $FirstObserved[$key] = $date
        }
    }
    else
    {
        # First instance of the Name + MAC combo, save the date
        $FirstObserved[$key] = $date
    }
}

Now you can eaily look up the first date for a combination with:

$FirstObserved["Computer-01,11:22:33:44:55:66"]

If you want to export the list to another csv, all you need to do is turn the key-value pairs of the hashtable into objects:

$FirstObserved |Select-Object @{Name='Identity';Expression={$_.Key}},@{Name='FirstObserved';Expression={$_.Value}} |Export-Csv -Path output.csv
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Thanks for the reply. That works great but I end up dropping the other values. The data before the "dedup" process contains ComputerName, IPAddress, MacAddress, and FirstObserved. I would like to have the results contain that information just duplicates removed. Again thanks for your time today it is greatly appreciated. – BDubs Jun 02 '19 at 21:11
  • 1
    @BDubs easy, just store `$_` inside the hashtable and compare `$date -le $FirstObserved[$key].FirstObserved` instead, then export `$FirstObserved.Values` – Mathias R. Jessen Jun 02 '19 at 21:35