2

I've got a log file with some bad characters in it. While there are many, the one I'm specifically interested in at the moment is ÿ

When I try to do a simple select-string with it, I get no results at all:

select-string -path D:\logs\*.log -Pattern 'ÿ'

I have tried adding encoding, but that didn't return any results either. I tried all of the following:

select-string -path D:\logs\*.log -Pattern 'ÿ' -Encoding "Unicode"
select-string -path D:\logs\*.log -Pattern 'ÿ' -Encoding "UTF8"
select-string -path D:\logs\*.log -Pattern 'ÿ' -Encoding "ASCII"

What am I missing?

TulsaNewbie
  • 382
  • 3
  • 15
  • 1
    Testing for that character I am able to get matches. I feel confident that the problem is in the encoding of the file. If you read the file with 'get-content' does it display those characters? If you open the file in notepad++ (if you have it), does it show you the encoding? – shadow2020 May 20 '21 at 20:44
  • hmm, notepad++ says it's ANSI, which I believe should be the default one for powershell. – TulsaNewbie May 20 '21 at 20:51
  • 1
    Can you actually see the ```ÿ``` when you open the file in Notepad++? You're unlikely to see a ```ÿ``` in an ANSI file because it would have been converted to a question mark ```?``` as a result of the encoding when the file was written. – mclayton May 20 '21 at 21:07
  • I can. Hmm... I am taking my "it's ANSI" thought from clicking on Encoding at the top and seeing ANSI is the one with the dot. That could be incorrect. – TulsaNewbie May 20 '21 at 21:42
  • 1
    @TulsaNewbie, use [`Format-Hex`](https://learn.microsoft.com/powershell/module/microsoft.powershell.utility/format-hex) with a single `.log` file (use the `-LiteralPath` parameter, don't pipe) to find out how the `ÿ` is represented at a byte level in the file(s). – mklement0 May 20 '21 at 22:06
  • 1
    for me using `-Encoding default` on PowerShell 5.1 works. There is a difference though. PS 5.1 encoding `default` uses the encoding that corresponds to the system's active code page (usually ANSI). For PS 7.1 the default value is utf8NoBOM. – Theo May 21 '21 at 10:50
  • 1
    @Theo haha that was it... I hadn't actually tried "default" as the encoding type. – TulsaNewbie May 21 '21 at 17:20

1 Answers1

0

Try to use a regex with a negative lookahead. I think this is the easiest way since we don't know much about the file or the character encoding.

-pattern '^(?!.*[a-zA-Z]|\S\s|^$).*$'

This will search for anything that is not: a thru z lowercase, A thru Z uppercase, digits, or whitespace like a space or new line. In theory it should capture your "ÿ" character and any other weirdo characters that may be showing up. You can add more to it if you need to. | symbols means "AND" so you can add a "|" after the $ in the pattern and add more characters if necessary like ^(?!.*[a-zA-Z]|\S\s|^$).*$|\=

shadow2020
  • 1,315
  • 1
  • 8
  • 30