1

Background: I changed filenames of .mp4 videos to lowercase and replaced the special characters as well as spaces. Now I have to change the associated URLs inside of .txt files in a similar manner. There are many text files which contains plenty of these URLs referring to the videos.

Issue: I should replace the special characters in every string between "flashplayer" and "/flashplayer" at any textfile, but must not change anything outside the flashplayer tags.

I don't know how to select the strings between "flashplayer" and "/flashplayer" for the replacement.

Sample string:

(flashplayer width="640" height="480" position="1")file=/wiki/data/media/sales/a/ö 2.mp4&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0(/flashplayer)

This sample is included in a textfile (DokuWiki page). The () imply tag characters.

Sample output string:

(flashplayer width="640" height="480" position="1")file=/wiki/data/media/sales/a/oe_2.mp4&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0(/flashplayer)

The replacement with rename-item should be:

  • ä = ae
  • ö = oe
  • ü = ue
  • ' ' = '_'

Update: the script looks like:

# vars (User-Eingabe)
$source = "d:\here\name\test\pages"
$search = '(\<flashplayer.*?\>file\=/wiki/87sj38d/media)(.*?)(\<\/flashplayer\>)'
$a = 1
Write-Host "`nSource:`t $source`n"
# replace special characters
gci $source -r -Filter *.txt | ForEach-Object {
    $text = Get-Content $_.FullName | ForEach-Object {
        if($_ -match $search) {
            $_ -replace [Regex]::Escape($Matches[2]), ($Matches[2] -replace'ö', 'oe' -replace'ä', 'ae' -replace'ü', 'ue' -replace'\s', '_' )
            $output = $Matches[2]
            $tags = $a++         
            Write-Host "`nTag $tags : $output"
        } else {
            $_
        }
    }
    $text | Set-Content $_.FullName
}

The textfiles contain a line of code like this:

{{backlinks>path:product:description:kennwort_aendern}}

The script works only if I delete this line of code. Otherwise the string between the flashplayertags stay the same. Confusingly enough, the replacement operates sometimes and sometimes not. The string between the flashplayertags can contain many special characters. See sample string:

<flashplayer_width="640"_height="480"_position="1">file=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.mp4&image=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.jpg&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0</flashplayer>

The Write-Host $output shows all strings correctly but the replacement doesn't function properly.

fuchur2502
  • 25
  • 5

2 Answers2

2

You can try something like this. For each textfile, it will replace the special characters on every flashplayer line.

Get-ChildItem -Path "c:\FolderOfTextfiles" -Filter *.txt | ForEach-Object {

    $text = Get-Content $_.FullName | ForEach-Object {
        if($_ -match '(?<=\(flashplayer.*?\))(.*?)(?=\(/flashplayer\))') {
            $_ -replace [Regex]::Escape($Matches[1]), ($Matches[1] -replace'ö', 'oe' -replace 'ä', 'ae' -replace 'ü', 'ue' -replace '\s', '_' )
        } else {
            $_
        }
    }

    $text | Set-Content $_.FullName

}

UPDATE: If the text contains linebreaks, then you could try this global multiline regex matching apporach:

$s = @'
<flashplayer_width="640"_height="480"_position="1">file=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.mp4&image=/wiki/87sj38d/media/ab/
any/test/1001_Grundlagen Kennwort ändern.jpg&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0</flashplayer>
<flashplayer_width="640"_height="480"_position="1">file=/wiki/87sj38f/media/ab/any/test/1001_Grundlagen Kennwort ändern.mp4&image=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.jpg&
config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0</flashplayer>
'@

#Read text as single string
#PS 3.0+
#$s = Get-Content .\test.txt -Raw

#PS 2.0
#$s = Get-Content .\test.txt | Out-String

$s = [regex]::Replace($s, '(?s)(?<=<flashplayer.*?>file=/wiki/87sj38d/media).*?(?=</flashplayer>)', { 
    param([System.Text.RegularExpressions.Match]$m)
    $m.Value -replace 'ö', 'oe' -replace 'ä', 'ae' -replace 'ü', 'ue' -replace ' ', '_'
})

$s    

#Save
#$s | Set-Content .\test.txt

This is a bit more complicated solution, because AFAIK you can't modify $1 (captured group) when using -replace 'pattern', '$1' in the current PowerShell version. If someone has a better solution, please share :)

Frode F.
  • 52,376
  • 9
  • 98
  • 114
  • The input string (.*?) can contain a | character. In case of an existing | between the flashplayer tags, the script doesn't work and put the captured string additionally to the rewritten string. Special characters can be interpreted by powershell as string in combination with a precending '\'. Is there a way to interpret the input string only as string and ignore the optional characters? Subsqeuently the input und false output string: `[[http://a/b/docs/d/e | description ]]` `[[http://a/b/docs/d/e | description|a/b/docs/d/e | description ]]` Existing special characters are processed well – fuchur2502 Jul 28 '14 at 06:20
  • Try updated answer. I've added an escape-method in the `-replace` command to make sure it ignores the special characters. If it doesn't work, could your provide a string that doesn't work that follows the pattern? `(flashplayer .....) sajdkaljdlsadkasd (/flashplayer)`. And please update your question when providing code. It's hard to understand them in comments. :) – Frode F. Jul 28 '14 at 07:58
  • So my update didn't work? Is it a typo in the new sample? You've replaced `(flashplay...)` with ``. – Frode F. Jul 29 '14 at 12:14
  • firstly my "grather than" and "smaller than" signs wouldn't be accepted by this post. So i changed it to brackets. The updated sample at least is a correct view of the string. The search term in $search does the replacement most of the time, but not when the backlinks code line (see update) is in a text file. Is my adjusted search term correct? – fuchur2502 Jul 29 '14 at 12:41
  • `<` and `>` are accepted when you put it inside code blocks, which you should always do(someone fixed it for you this time). In your `$search`-pattern, you have removed my lookaheads/lookbehinds. This breaks the text-replacement. Try `$search = '(?<=file=/wiki/87sj38d/media)(.*?)(?=)'` – Frode F. Jul 29 '14 at 13:18
  • Thank you! I had to appoint the correct encoding to get- and set-content. The script works so far. Only if the sample string contains a undesired line break, the replacement does not proceed. The closing tag will be considered only in the same row. – fuchur2502 Jul 30 '14 at 06:12
  • That's because `Get-Content` splits line into seperate objects. You would need to read the text as a single string and run a global multiline match. See updated answer. – Frode F. Jul 30 '14 at 08:14
  • thanks for your support. To append de line `-encoding UTF8` after get-content and set-content the file encoding stay the same. – fuchur2502 Jul 30 '14 at 09:55
0

Here you have the commands you could use to replace the mentioned characters. You will need to change the filepath according to the location of the textfiles. Replace-FileString.ps1 is used; http://windowsitpro.com/scripting/replacing-strings-files-using-powershell

./Replace-FileString  -Pattern '(flashplayer)(.*)ä(.*)(\/flashplayer)'  -Replacement '$1$2ae$3$4'  -Path C:\test\*.txt  -Overwrite
./Replace-FileString  -Pattern '(flashplayer)(.*)ö(.*)(\/flashplayer)'  -Replacement '$1$2oe$3$4'  -Path C:\test\*.txt  -Overwrite
./Replace-FileString  -Pattern '(flashplayer)(.*)ü(.*)(\/flashplayer)'  -Replacement '$1$2ue$3$4'  -Path C:\test\*.txt  -Overwrite
./Replace-FileString  -Pattern '(flashplayer)(.*) (.*)(\/flashplayer)'  -Replacement '$1$2_$3$4'  -Path C:\test\*.txt  -Overwrite

It opens and writes all textfiles (even if it doesn't change anything). It only changes the lines where "ä", "ö", "ü" or " " is found between the strings "flashplayer" and "/flashplayer".

Kokkie
  • 546
  • 6
  • 15