2

I am creating a script that will copy a file, rename it and then look inside to remove certain special characters. One of these special characters is some sort of ASCII apostrophe that I cannot replicate with keys. I can copy and paste it though, however the replace function doesn't work.

Opens file > Searches for strange apostrophe ’ and replaces with nothing. I'd like it to replace it with a normal apostrophe but I don't know how this is done, and at current the biggest problem is that I can't get it to "see" this strange apostrophe that winds up in the autogenerated file I'm modifying. Any help much appreciated. Thanks :)

Apostrophe in file: ’

Normal Apostrophe: '

This is a chunk of the batch that I've isolated to test with.

        @echo off

    set YYMMDD=%DATE:~-2,2%%DATE:~-7,2%%DATE:~-10,2%
    set DDMMYYYY=%DATE:~-10,2%%DATE:~-7,2%%DATE:~-4,4%
    set YYYY-MM-DD=%DATE:~-4,4%-%DATE:~-7,2%-%DATE:~-10,2%

powershell -Command "(gc 'C:\LOCATION\Client_List_%DDMMYYYY%.csv') -replace '’', '' | Out-File 'C:\LOCATION\Client_List_%DDMMYYYY%.csv'"

    Echo Done
meeilz
  • 23
  • 1
  • 4
  • What is the ASCII code for the strange apostrophe? BTW, the backtick character looks a little like a strange apostrophe, but not like the one you've shown us. The backtick character is used as an escape in PS strings. – Walter Mitty Jan 05 '17 at 15:09
  • Does `echo ’` in batch file print that special apostrophe (to be sure, that it is not encoding problem)? Also, you need to escape that special apostrophe inside single quote string, because for PowerShell that special apostrophe is a valid single quote character: `-replace '’’', ''`. – user4003407 Jan 05 '17 at 15:14
  • You may be able to do a regex. If I copy and paste the weird apostrophe into regex101 it does recognize that it's different. Even if you can't figure out what it is, that may at least enable you to replace it. – Nick Jan 05 '17 at 15:56
  • Thanks for the quick replies! @PetSerAl if I echo the special apostrophe, it gives me the AE symbol (latin) so it's struggling to pick up what it is exactly. WalterMitty I'm unsure, I cannot replicate it outside of copy & paste. Nick How would I implement a Regex in my batch script? The PS component is a one-liner so I'm not sure how it'd be done. Thanks for your help guys, hopefully we can get it sorted! :) – meeilz Jan 05 '17 at 16:19
  • Adding to this, if I do a write-host in PS, the funny apostrophe displays differently to the normal one, so a PS ''echo'' works for that character where batch does not. – meeilz Jan 05 '17 at 16:27
  • @meeilz Show output of this PowerShell command: `([Console]::InputEncoding, [Console]::OutputEncoding, [Text.Encoding]::Default) | % CodePage`. What encoding you use for batch file? – user4003407 Jan 05 '17 at 16:37

2 Answers2

1
set "fileIn=C:\LOCATION\Client_List_%DDMMYYYY%.csv"
set "fileOu=C:\LOCATION\Client_List_%DDMMYYYY%.csv"
powershell -c "(gc '%fileIn%').Replace('‘‘','').Replace('’’','')|Out-File '%fileOu%'"

That strange apostrophe is U+2019 Right Single Quotation Mark, supposedly a closing quote. It could be paired with a different opening quote. In above example, is U+2018 Left Single Quotation Mark.

Get-Help 'about_Quoting_Rules' says

Quotation marks are used to specify a literal string. You can enclose a string in single quotation marks (') or double quotation marks (").

In fact, PowerShell accepts two different sets of quotes:

  • double quotation marks "
  • single quotation marks '

AFAIK, all those quotation marks are present in most Windows ANSI code pages (1252, 1250, 1257, 1253, 1251, 1254, 1255, 1256, 1258) so they may be used literally in ANSI-saved .bat script - except the latter quotation mark U+201B Single High-Reversed-9 Quotation Mark. In such case, use $([char]0x201B) instead of '‛‛' as follows:

rem        cast [char] to `[string]`    ↓↓↓↓↓↓↓↓
powershell -c "(gc '%fileIn%').Replace( [string]$([char]0x201B) , '')"
rem                                             ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

or as follows:

rem [char] can't be empty so specify `[string]`           ↓↓↓↓↓↓↓↓
powershell -c "(gc '%fileIn%').Replace( $([char]0x201B) , [string]'')"
rem                                     ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

Analysis and explanation

Next PowerShell code snippet shows an excerpt from Unicode database (character names ending with Quotation Mark or containing Apostrophe):

PS D:> 0x22,0x27,0x00AB,0x00BB,0x2018,0x2019,0x201A,0x201B,0x201C,0x201D,0x201E,0x201F,
  0x2039,0x203A,0x2E42,0x301D,0x301E,0x301F,0x055A | Get-CharInfo | Format-Table -AutoSize

Char CodePoint                Category Description                               
---- ---------                -------- -----------                               
   " U+0022           OtherPunctuation Quotation Mark                            
   ' U+0027           OtherPunctuation Apostrophe                                
   « U+00AB    InitialQuotePunctuation Left-Pointing Double Angle Quotation Mark 
   » U+00BB      FinalQuotePunctuation Right-Pointing Double Angle Quotation Mark
   ‘ U+2018    InitialQuotePunctuation Left Single Quotation Mark                
   ’ U+2019      FinalQuotePunctuation Right Single Quotation Mark               
   ‚ U+201A            OpenPunctuation Single Low-9 Quotation Mark               
   ‛ U+201B    InitialQuotePunctuation Single High-Reversed-9 Quotation Mark     
   “ U+201C    InitialQuotePunctuation Left Double Quotation Mark                
   ” U+201D      FinalQuotePunctuation Right Double Quotation Mark               
   „ U+201E            OpenPunctuation Double Low-9 Quotation Mark               
   ‟ U+201F    InitialQuotePunctuation Double High-Reversed-9 Quotation Mark     
   ‹ U+2039    InitialQuotePunctuation Single Left-Pointing Angle Quotation Mark 
   › U+203A      FinalQuotePunctuation Single Right-Pointing Angle Quotation Mark
   ⹂ U+2E42           OtherNotAssigned Undefined                                 
   〝 U+301D            OpenPunctuation Reversed Double Prime Quotation Mark      
   〞 U+301E           ClosePunctuation Double Prime Quotation Mark               
   〟 U+301F           ClosePunctuation Low Double Prime Quotation Mark           
   ՚ U+055A           OtherPunctuation Armenian Apostrophe                       

(Output from modified Get-CharInfo cmdlet.) Original Get-CharInfo module is downloadable from http://poshcode.org/5234.

Next PowerShell script completes above results by showing some valid (and invalid in my locale) combinations of quotes:

$arrSingleQuotes = 
 ''' U+0027 Apostrophe '''                                ,
 ‘‘‘ U+2018 Left Single Quotation Mark ‘‘‘                ,
 ’’’ U+2019 Right Single Quotation Mark ’’’               ,
 ‚‚‚ U+201A Single Low-9 Quotation Mark ‚‚‚               ,
 ‛‛‛ U+201B Single High-Reversed-9 Quotation Mark ‛‛‛     ,
 ‘‘‘ U+2018 (Left/Right) Single Quotation Mark U+2019 ’’’ ,
 ’’’ U+2019 (Right/Left) Single Quotation Mark U+2018 ‘‘‘
'$arrSingleQuotes (any combination)'
 $arrSingleQuotes

$arrDoubleQoutes = 
 """ U+0022 Quotation Mark """                            ,
 “““ U+201C Left Double Quotation Mark “““                ,
 ””” U+201D Right Double Quotation Mark ”””               ,
 „„„ U+201E Double Low-9 Quotation Mark „„„               ,
 “““ U+201C (Left/Right) Double Quotation Mark U+201D ””” ,
 ””” U+201D (Right/Left) Double Quotation Mark U+201C “““
'$arrDoubleQoutes (any combination)'
 $arrDoubleQoutes

$noQuotes = @"
 « U+00AB Left-Pointing Double Angle Quotation Mark
 » U+00BB Right-Pointing Double Angle Quotation Mark
 ‟ U+201F Double High-Reversed-9 Quotation Mark
 ⹂ U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK
 ‹ U+2039 Single Left-Pointing Angle Quotation Mark
 › U+203A Single Right-Pointing Angle Quotation Mark
〝 U+301D Reversed Double Prime Quotation Mark
 〞U+301E Double Prime Quotation Mark
 〟U+301F Low Double Prime Quotation Mark
 ՚ U+055A Armenian Apostrophe                       
"@
'$noQuotes'
 $noQuotes

Output:

PS D:> D:\PShell\SO\41488245_quotes.ps1

$arrSingleQuotes (any combination)
' U+0027 Apostrophe '
‘ U+2018 Left Single Quotation Mark ‘
’ U+2019 Right Single Quotation Mark ’
‚ U+201A Single Low-9 Quotation Mark ‚
‛ U+201B Single High-Reversed-9 Quotation Mark ‛
‘ U+2018 (Left/Right) Single Quotation Mark U+2019 ’
’ U+2019 (Right/Left) Single Quotation Mark U+2018 ‘

$arrDoubleQoutes (any combination)
" U+0022 Quotation Mark "
“ U+201C Left Double Quotation Mark “
” U+201D Right Double Quotation Mark ”
„ U+201E Double Low-9 Quotation Mark „
“ U+201C (Left/Right) Double Quotation Mark U+201D ”
” U+201D (Right/Left) Double Quotation Mark U+201C “

$noQuotes
 « U+00AB Left-Pointing Double Angle Quotation Mark
 » U+00BB Right-Pointing Double Angle Quotation Mark
 ‟ U+201F Double High-Reversed-9 Quotation Mark
 ⹂ U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK
 ‹ U+2039 Single Left-Pointing Angle Quotation Mark
 › U+203A Single Right-Pointing Angle Quotation Mark
〝 U+301D Reversed Double Prime Quotation Mark
 〞U+301E Double Prime Quotation Mark
 〟U+301F Low Double Prime Quotation Mark
 ՚ U+055A Armenian Apostrophe                       

Note that ⹂ U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK is present in Unicode database and is properly rendered in PowerShell ISE.

Addendum: I found more candidates of quotation marks (shown merely result obtained from Excerpt_From_UnicodeDataTxt.ps1 script):

PS > $x = .\tests\Excerpt_From_UnicodeDataTxt.ps1 -SearchString "Quotation|Apostrophe" | 
    Where-Object {$_.Category -match 'Punctuation'}

PS > $x.Count
23

PS > $x

Char CodePoint Category                   Description                                       
---- --------- --------                   -----------                                       
   " U+0022    Po-OtherPunctuation        Quotation Mark                                    
   ' U+0027    Po-OtherPunctuation        Apostrophe                                        
   « U+00AB    Pi-InitialQuotePunctuation Left-Pointing Double Angle Quotation Mark         
   » U+00BB    Pf-FinalQuotePunctuation   Right-Pointing Double Angle Quotation Mark        
   ՚ U+055A    Po-OtherPunctuation        Armenian Apostrophe                               
   ‘ U+2018    Pi-InitialQuotePunctuation Left Single Quotation Mark                        
   ’ U+2019    Pf-FinalQuotePunctuation   Right Single Quotation Mark                       
   ‚ U+201A    Ps-OpenPunctuation         Single Low-9 Quotation Mark                       
   ‛ U+201B    Pi-InitialQuotePunctuation Single High-Reversed-9 Quotation Mark             
   “ U+201C    Pi-InitialQuotePunctuation Left Double Quotation Mark                        
   ” U+201D    Pf-FinalQuotePunctuation   Right Double Quotation Mark                       
   „ U+201E    Ps-OpenPunctuation         Double Low-9 Quotation Mark                       
   ‟ U+201F    Pi-InitialQuotePunctuation Double High-Reversed-9 Quotation Mark             
   ‹ U+2039    Pi-InitialQuotePunctuation Single Left-Pointing Angle Quotation Mark         
   › U+203A    Pf-FinalQuotePunctuation   Single Right-Pointing Angle Quotation Mark        
   ❮ U+276E    Ps-OpenPunctuation         Heavy Left-Pointing Angle Quotation Mark Ornament 
   ❯ U+276F    Pe-ClosePunctuation        Heavy Right-Pointing Angle Quotation Mark Ornament
   ⹂ U+2E42    Ps-OpenPunctuation         Undefined                                         
   〝 U+301D    Ps-OpenPunctuation         Reversed Double Prime Quotation Mark              
   〞 U+301E    Pe-ClosePunctuation        Double Prime Quotation Mark                       
   〟 U+301F    Pe-ClosePunctuation        Low Double Prime Quotation Mark                   
   " U+FF02    Po-OtherPunctuation        Fullwidth Quotation Mark                          
   ' U+FF07    Po-OtherPunctuation        Fullwidth Apostrophe                              
JosefZ
  • 28,460
  • 5
  • 44
  • 83
  • Hi, thanks for such an incredibly detailed response, this appears to be certainly on the right track. I've run the following line --- powershell -c "(gc 'Client_XXX_List_%date%.csv').Replace( $([char]0x201B) , '')" --- but I get the following response: https://postimg.org/image/u0h4rjruh/ --- any ideas why? Thanks! – meeilz Jan 09 '17 at 09:47
  • @meeilz answer updated. Sorry my neglect, I tested `.Replace( $([char]0x201B) , '#'`) i.e. replace to another _character_. Now works as _replace to an empty string_ as well. – JosefZ Jan 09 '17 at 17:49
  • Thanks :) It now passes that initial stage, no errors which is excellent. Currently it will search through the entire document and that appears to be ok but it doesn't find the character, I've narrowed it down to actually being "Right single quotation mark" rather than the reversed one, and I've changed the code to 0x2019 which I believe is correct, but it still doesn't find/replace any of the quotation marks in my csv? Very unusual! Much appreciate your help :) Thanks! -- `powershell -c "(gc 'Client_List_05012017.csv').Replace( $([char]0x2019) , [string]'A')"` – meeilz Jan 11 '17 at 09:15
  • Adding to this, I've fixed it :) Got it working now! It was an error, I hadn't set up the close file component! Thanks for your help. Miles – meeilz Jan 11 '17 at 10:54
0

I think it's a weird backtick character. At least that's what it's acting like.

If I do this:

$text = "Weird ’ Normal ' Backtick ` Weird ’ "
$text.Replace("’","")

It gives me This:

Weird  Normal ' Backtick Weird

So does this work?

powershell -Command "(gc 'C:\LOCATION\Client_List_%DDMMYYYY%.csv').replace('’’', '') |
 Out-File 'C:\LOCATION\Client_List_%DDMMYYYY%.csv'"

By doubling a normal back tick, it makes the script take the character literally. Doubling the weird apostrophe seems to do the same thing, at least in my testing that works.

Nick
  • 1,178
  • 3
  • 24
  • 36
  • Thanks for the reply! In doubling it I don't get any different outcome unfortunately, it still processes and says "done" (my echo at the end) however checking in the csv for the weird apostrophe, it's still there on several lines. Any other ideas? – meeilz Jan 05 '17 at 17:21
  • Can you give me some sample data from your CSV? – Nick Jan 05 '17 at 17:22
  • 2
    It's a typographic single quote, not a backtick. – Ansgar Wiechers Jan 06 '17 at 00:38