I have the following scenario. I have a huge text that is full of words with a REPLACEMENT_CHARACTER "�". My script has already produced a dictionary that provides the correct Translation of theses words by using key value pairs. It Looks like:
"gew�hlte": "gewählte"
"Betr�ge;": "Beträge;"
I have about 1200 Entries in this Dictionary. On the (huge) Textfile im using this command in a loop to do my corrections:
foreach($key in $solutionsDictionary.Keys)
{
#Replace the key with value.
[String]$value = $solutionsDictionary[$key]
(Get-Content -encoding UTF8 $file) -replace [Regex]::Escape($key), "$value" | Set-Content -encoding UTF8 $file
}
But it works incredibly slowly. To speed it up, I would like to filter on the lines that really contain this character and then correct these lines specifically by using the words as the key for my lexicon instead of trying each key until I have found the right one. However, I do not know how I can write back a single line into the file within the iteration and then continue looking for the next one? The new incomplete algorithm looks like this:
$SearchCharacter = '�'
$lines = get-content $file -encoding UTF8 | select-string $SearchCharacter
foreach ($line in $lines)
{
# Split into words and find the ones which contain the searchCharacter
$words = -split $line
$words = @($words) -match $SearchCharacter
foreach ($word in $words){
# replace each word in the line.by using word as index.
# Code missing here. How to write back a single line?
}
}
If the "select-string" Property is the Problem, i can do the replacement without it. Any suggestion on how to do this? Thanks a lot!
Edit: The folllowing solution Came up:
$SearchCharacter = '�'
Get-Content $file -encoding UTF8 |
ForEach-Object {
If ($_.Contains($SearchCharacter)) {
$Words = $_ -Split '\s+'
$words = @($words) -match $SearchCharacter
ForEach ($Word in $Words) {
If ($solutionsDictionary.ContainsKey($Word))
{
$_.Replace([Regex]::Escape($Word), $solutionsDictionary[$Word])
}
}
}
$_
} | Set-Content -encoding UTF8 $Outfile
It works so far, but it has another disadvantage. The target file receives one line for each corrected word. I just don't see how to prevent this. So for example with that Input:
Das hier r�ckg�ngig ist das zu machen
r�ckg�ngig : ist bereits geamcht
Weitere W�rter gibt ers zu korrigieren
Hier noch ein bl�des Wort
zwei in einer Zeile G�hte und Gr��e
I get this solution:
Das hier rückgängig ist das zu machen
Das hier r�ckg�ngig ist das zu machen
rückgängig : ist bereits geamcht
r�ckg�ngig : ist bereits geamcht
Weitere Wörter gibt ers zu korrigieren
Weitere W�rter gibt ers zu korrigieren
Hier noch ein blödes Wort
Hier noch ein bl�des Wort
zwei in einer Zeile Göhte und Gr��e
zwei in einer Zeile G�hte und Größe
zwei in einer Zeile G�hte und Gr��e
So how to Prevent PowerShell from writing a new line for every correction?
Edit2:
The Right solution for that is to insert the assignment of $_=
$SearchCharacter = '�'
Get-Content $file -encoding UTF8 |
ForEach-Object {
If ($_.Contains($SearchCharacter)) {
$Words = $_ -Split '\s+'
$words = @($words) -match $SearchCharacter
ForEach ($Word in $Words) {
If ($solutionsDictionary.ContainsKey($Word))
{
$_ = $_.Replace([Regex]::Escape($Word), $solutionsDictionary[$Word])
}
}
}
$_
} | Set-Content -encoding UTF8 $Outfile