How remove accents in PowerShell?

Question

I have a script which creates users in Microsoft Exchange Server and Active Directory. So, though it's commmon that user's names have accents or ñ in Spain, I want to avoid them for the username to not to cause any incompatibilities in old systems.

So, how could I clean a string like this?

$name = "Ramón"

To be like that? :

$name = "Ramon"

score 22 · Answer 1 · answered Oct 20 '11 at 19:08

As per ip.'s answer, here is the Powershell version.

function Remove-Diacritics {
param ([String]$src = [String]::Empty)
  $normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
  $sb = new-object Text.StringBuilder
  $normalized.ToCharArray() | % { 
    if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
      [void]$sb.Append($_)
    }
  }
  $sb.ToString()
}

# Test data
@("Rhône", "Basíl", "Åbo", "", "Gräsäntörmä") | % { Remove-Diacritics $_ }

Output:

Rhone
Basil
Abo

Grasantorma

penderi · Accepted Answer · 2020-09-26T20:01:12.243

8

Well I can help you with some of the code.....

I used this recently in a c# project to strip from email addresses:

    static string RemoveDiacritics(string input)
    {
        string inputFormD = (input ?? string.Empty).Normalize(NormalizationForm.FormD);
        StringBuilder sb = new StringBuilder();

        for (var i = 0; i < inputFormD.Length; i++)
        {
            UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(inputFormD[i]);
            if (uc != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(inputFormD[i]);
            }
        }

        return (sb.ToString().Normalize(NormalizationForm.FormC));
    }

I guess I can now say 'extending into a PowerShell script/form is left to the reader'.... hope it helps....

edited Sep 26 '20 at 20:01

answered Oct 20 '11 at 13:40

penderi

8,673
5
45
62

+1 Smart snippet, I converted it to PowerShell, it works as expected thanks. – JPBlanc Oct 20 '11 at 14:17
It works pretty fine in PowerShell. Really thanks for sharing :D – Antonio Laguna Nov 30 '11 at 09:37

score 7 · Answer 3 · answered Apr 20 '18 at 12:20

7

With the help of the above examples I use this "one-liner:" in pipe (tested only in Win10):

"öüóőúéáűí".Normalize("FormD") -replace '\p{M}', ''

Result:

ouooueeui

answered Apr 20 '18 at 12:20

it_specialist

71
1
2

score 7 · Answer 4 · answered Oct 21 '11 at 07:12

Another PowerShell translation of @ip for non C# coders ;o)

function Remove-Diacritics 
{
  param ([String]$sToModify = [String]::Empty)

  foreach ($s in $sToModify) # Param may be a string or a list of strings
  {
    if ($sToModify -eq $null) {return [string]::Empty}

    $sNormalized = $sToModify.Normalize("FormD")

    foreach ($c in [Char[]]$sNormalized)
    {
      $uCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($c)
      if ($uCategory -ne "NonSpacingMark") {$res += $c}
    }

    return $res
  }
}

Clear-Host
$name = "Un été de Raphaël"
Write-Host (Remove-Diacritics $name )
$test = ("äâûê", "éèà", "ùçä")
$test | % {Remove-Diacritics $_}
Remove-Diacritics $test

score 4 · Answer 5 · answered Aug 13 '15 at 15:26

4

PS> [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding(1251).GetBytes("Ramón"))
Ramon
PS>

answered Aug 13 '15 at 15:26

Damian Powell

8,655
7
48
58

Fails for some characters, e.g. `Æ×Þ°±ß…`. [A real _Old English_ example](https://www.researchgate.net/publication/277748378_Fore_daere_maerde_mod_astige_two_new_perspectives_on_the_Old_English_Gifts_of_men): returns `Fore ??re m?r?e?` if applied to `Fore ðære mærðe…` – JosefZ Mar 20 '16 at 16:03

score 3 · Answer 6 · answered Mar 23 '18 at 19:55

Instead of creating a stringbuilder and looping over characters, you can just use -replace on the NFD string to remove combining marks:

function Remove-Diacritics {
param ([String]$src = [String]::Empty)
  $normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
  ($normalized -replace '\p{M}', '')
}

score 2 · Answer 7 · answered Jun 21 '13 at 13:17

Another solution... quickly "reuse" your C# in PowerShell (C# code credits lost somewhere on the net).

Add-Type -TypeDefinition @"
    using System.Text;
    using System.Globalization;

    public class Utils
    {
        public static string RemoveDiacritics(string stIn)
        {
            string stFormD = stIn.Normalize(NormalizationForm.FormD);
            StringBuilder sb = new StringBuilder();

            for (int ich = 0; ich < stFormD.Length; ich++)
            {
                UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
                if (uc != UnicodeCategory.NonSpacingMark)
                {
                    sb.Append(stFormD[ich]);
                }
            }
            return (sb.ToString().Normalize(NormalizationForm.FormC));
        }
    }
"@ | Out-Null

[Utils]::RemoveDiacritics("ABC-abc-ČŠŽ-čšž")

How remove accents in PowerShell?

7 Answers7

Linked

Related