0

Have a bunch of values from csv file, with column looking like this:

shouston
cgonzalez
bbrown
hlader
kpesavento
jbloom
polson
bcharlow
bcharlow
bkalt

Need to find duplicates and modify them.

# Grab CSV file
$inputFile = Import-Csv -Path $filePath
$text = (Get-Culture).TextInfo
$HashSet = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::OrdinalIgnoreCase)

foreach ($line in $inputFile) {
    $name = $line.name
    $line.name = $text.ToTitleCase($name)
    $firstName = $line.name.split(" ")[0]
    $lastName = $line.name.split(" ")[1]
    $newEmail = ($firstName[0] + $lastName).toLower()

    if (!$HashSet.Add($newEmail)) {
        $line.email = ($firstName[0] + $lastName + $line.location_id + "@abc.com").toLower()
    }
    else {
        $line.email = ($firstName[0] + $lastName + "@abc.com").toLower()
    }
} 

$inputFile | Export-Csv $fullPath

But its not working as I expected - find a duplicate values and modify them. In my case, the location must also be added to the first duplicate email, example bcharlow1 bcharlow2, not it's bcharlow, bcharlow2. Can anyone help me - how I need to modify script?

Art
  • 29
  • 7

2 Answers2

2

You can use Group-Object to see the duplicates, so doing something like :

$val=@(
'**name**',
'shouston',
'cgonzalez',
'bbrown',
'hlader',
'kpesavento',
'jbloom',
'polson',
'bcharlow',
'bcharlow',
'bkalt'
)

$val | Group-Object

Will give you a count of how many times the given value is been seen. If you want to see lets say the top 3 you can pipe it to sort and select

$val | Group-Object -NoElement | Sort-Object Count -Descending | Select-Object -First 3

Count Name                     
----- ----                     
    2 bcharlow                 
    1 **name**                 
    1 shouston 
Vasil Nikolov
  • 667
  • 6
  • 16
  • Thanks for the answer. this script didn't give me anything to work with and apply the if conditions. Result Count Name ----- ---- 1 shouston 1 cgonzalez 1 bbrown 1 hlader 1 kpesavento 1 jbloom 1 polson 1 bcharlow 1 bcharlow bcharlow occurs several times, but for some reason it is considered each time as a unique value – Art Nov 11 '22 at 11:34
  • 1
    @lunnyj You could do something like this to rename duplicates: `$val | Group-Object | ForEach-Object { if( $_.Count -gt 1 ) { foreach( $i in 0..($_.Group.Count-1) ) { '{0}{1}' -f $_.Group[$i], ($i + 1) } } else { $_.Group } }` – zett42 Nov 11 '22 at 12:06
  • 1
    I worked off the example you provide, if they look the same but appear different it means they are likely having something different like an extra space or a hidden character, which ideally should be addressed before you use Group-Object. But that would be different problem than the one you have here. – Vasil Nikolov Nov 11 '22 at 14:55
1

Probably the easiest (and fastest) way to do this, is using a HashSet:

$InputFile = ConvertFrom-Csv @'
id, location_id, name,                 title,                              email, department
 1,           1, "Susan houston",      DS,                                      ,
 2,           1, "Christina Gonzalez", D,                                       ,
 3,           2, "Brenda brown",       "Director, Second Career Services",      ,
 4,           3, "Howard Lader",       "Manager, Senior Counseling",            ,
 5,           4, "Kimberly Pesavento", "Commercial director",                   ,
 6,           5, "Joe Bloom",          Finance,                                 ,          7
 8,           6, "Bart charlow",       Director,                                ,
 9,           7, "Bart Charlow",       Director,                                ,
'@

$Emails = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::OrdinalIgnoreCase)

foreach ($line in $inputFile) {
    $name = $line.name
    $line.name = (Get-Culture).TextInfo.ToTitleCase($name)
    $firstName = $line.name.split(" ")[0]
    $lastName = $line.name.split(" ")[1]
    $newEmail = ($firstName[0] + $lastName).toLower()

    if (!$Emails.Add($newEmail)) {
        $line.email = ($firstName[0] + $lastName + $line.location_id + "@abc.com").toLower()
    }
    else {
        $line.email = ($firstName[0] + $lastName + "@abc.com").toLower()
    }
}

$InputFile |ConvertTo-Csv
"id","location_id","name","title","email","department"
"1","1","Susan Houston","DS","shouston@abc.com",""
"2","1","Christina Gonzalez","D","cgonzalez@abc.com",""
"3","2","Brenda Brown","Director, Second Career Services","bbrown@abc.com",""
"4","3","Howard Lader","Manager, Senior Counseling","hlader@abc.com",""
"5","4","Kimberly Pesavento","Commercial director","kpesavento@abc.com",""
"6","5","Joe Bloom","Finance","jbloom@abc.com","7"
"8","6","Bart Charlow","Director","bcharlow@abc.com",""
"9","7","Bart Charlow","Director","bcharlow7@abc.com",
iRon
  • 20,463
  • 10
  • 53
  • 79
  • Amazing! But in my case, the location (its a value from another column) must also be added to the first duplicate email. Example - 'bcharlow1', 'bcharlow2'. Now its 'bcharlow', 'bcharlow2'. How script can be modifyid? – Art Nov 11 '22 at 12:02
  • 1
    @lunnyj, please, paste a minimal example of the `csv` file (with headers) in your question. Basically, you might start of with something like: `$Emails = [System.Collections.Generic.HashSet[string]]::new([String[]]($InputObject).Email, [System.StringComparer]::OrdinalIgnoreCase)` – iRon Nov 11 '22 at 12:05
  • its personal data, random generated names. Also I added my full script, at first message id,location_id,name,title,email,department 1,1,Susan houston,DS,, 2,1,Christina Gonzalez,D,, 3,2,Brenda brown,"Director, Second Career Services",, 4,3,Howard Lader,"Manager, Senior Counseling",, 5,4,Kimberly Pesavento,Commercial director,, 6,5,Joe Bloom,Finance,, 7,6,peter Olson,Director,, 8,6,Bart charlow,Director,, 9,7,Bart Charlow,Director,, – Art Nov 11 '22 at 12:15
  • In your example not correct output:( All mail values received location, but like I said above, location_idd add only when emails equals - in the example, only Bart Charlow must have location_id in mail, because bcharlow@abc.com a duplicates. – Art Nov 11 '22 at 15:16
  • Thank you, but now again, like at beginning - "8","6","Bart Charlow","Director","bcharlow@abc.com","" without location_id :( – Art Nov 11 '22 at 15:36