1

I'm using a regex in a txt using powershell, but it's work only if the text doesn't contain carriage return. I prepared an example file like this:

the duck is on the table --found!  

the elephant is on  the table --found! 

the cat is  
on the table --NOT found!  :-(

the lion is on the tablet --NOT found but ok ;-)

the dog is on  
the table               --NOT found!  :-(

the turtle isonthe table --NOT found but ok ;-)

the cow is on the              table --found! 

I want cases with contain "on the table", so I execute this:

select-string -path "c:\example.txt" -pattern '([^\w]{1})on([^\w])+the([^\w])+table([^\w]{1})'

This is the output :


example.txt:1:the duck is on the table --found!

example.txt:2:the elephant is on the table --found!

example.txt:14:the cow is on the table --found!


But i need also the cases with carriage return! Where is the cat? And where is the dog?

Thank you ;-)

mklement0
  • 382,024
  • 64
  • 607
  • 775

2 Answers2

2

I'm not sure if this is possible using Select-String because it goes line by line instead of reading the file as a single multiline string but this worked for me:

$tmp = New-TemporaryFile

@'
the duck is on the table 

the elephant is on the table 

the cat is
on the table

the lion is on the tablet

the dog is on
the table

the turtle isonthe table

the cow is on the table 
'@ | Set-Content $tmp


$content = Get-Content $tmp -Raw
[regex]::Matches($content, '.*[^\w]on[^\w]+the[^\w]+table[^\w].*') |
Select-Object Index,Value | Format-Table -Wrap

Result:

Index Value                         
----- -----                         
    0 the duck is on the table      
   29 the elephant is on the table  
   62 the cat is                    
      on the table                  
  119 the dog is on                 
      the table                     
  175 the cow is on the table   

If you want only white spaces between the words might be better to use:

'.*\son\s+the\s+table\s.*'

If you want case insensitive:

[regex]::Matches($content, '.*[^\w]on[^\w]+the[^\w]+table[^\w].*', [System.StringComparison]::OrdinalIgnoreCase)
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
1

With file input provided via Select-String's -Path or -LiteralPath parameters, the target file(s)' are processed line by line, as also pointed out in Santiago Squarzon's helpful answer.

In order to match pattern across lines, a file's content must be passed as a single, multi-line string, which is what Get-Content's -Raw switch does.

Additionally, in order to report multiple matches inside that multi-line string, Select-String's -AllMatches switch must be used.

The resulting matches can then be processed via the .Matches property of the Microsoft.PowerShell.Commands.MatchInfo instance(s) that Select-Object outputs:

Get-Content -Raw example.txt | 
  Select-String -AllMatches '(?m)^.*?\son\s+the\s+table\b.*$' |
    ForEach-Object {
      foreach ($match in $_.Matches) {
        "[$($match.Value)]"
      }
    }

For an explanation of the regex used above, see this regex101.com page.[1]

The above yields:

[the duck is on the table]
[the elephant is on  the table]
[the cat is  
on the table]
[the dog is on  
the table]
[the cow is on the              table]

[1] Note that even though regex101.com, a site for visualizing, explaining and experimenting with regexes, doesn't support the .NET regex engine used by PowerShell, choosing a similar engine, such as Java's, usually exhibits the same behavior, at least fundamentally.

mklement0
  • 382,024
  • 64
  • 607
  • 775