-1

In order, I have to:
1) grab all links from txt file

http://example1.htm
http://example2.htm
http://example3.htm
...

2) get source from each link
3) get my strings from source
4) export strings to csv

It works with one link. Example:

$topic1 = "kh_header.><b>((?<=)[^<]+(?=</b>))"
$topic2 = "<b>Numer ogłoszenia:\s([^;]+(?=;))"
 Select-String -Path strona1.htm -pattern $topic1 | foreach-object {
 $_.line -match $topic1 > $nul
 $out1 = $matches[1]
 }
 Select-String -Path strona1.htm -pattern $topic2 | foreach-object {
 $_.line -match $topic2 > $nul
 $out2 = $matches[1]
 }
echo $out1';'$out2';' | Set-content out.csv -force

, But I cant get it with many links in txt file. I try it:

$topic = "kh_header.><b>((?<=)[^<]+(?=</b>))"
$topic2 = "<b>Numer ogłoszenia:\s([^;]+(?=;))"
 $folder = Get-ChildItem e:\sk\html
  ForEach ($htmfile in $folder){
   If ($_.extension -eq ".htm"){
    $htmfile = ForEach-Object  {
            $WC = New-Object net.webclient
            $HTMLCode = $WC.Downloadstring($_.fullname)
            }
       Select-String -Path $HTMLCode -pattern $topic | foreach-object {
       $_.line -match $topic > $nul
       $out1 = $matches[1]
       }    
       Select-String -Path $HTMLCode -pattern $topic2 | foreach-object {
       $_.line -match $topic2 > $nul
       $out2 = $matches[1]
       }      
       echo $out1';'$out2';' | Set-content out.csv -force     
    }
}

How can I get it?

1 Answers1

1

When you use Select-String by default it only finds the first match on any particular line. You can use the AllMatches parameter to fix that e.g.:

foo.txt contains: "static void Main(string[] args)"

Select-String foo.txt -pattern '\W([sS]..)' -AllMatches | 
    Foreach {$_.Matches} |
    Foreach {$_.Groups[1].Value}

Also, Select-String is line oriented so it won't find pattern matches across lines. In order to find those, you need to read in the file as a string string e.g.:

$text = [io.file]::readalltext("$pwd\foo.txt")

And then use some special regex directives e.g.:

$text | Select-String -pattern '(?si)\W([sS]..)' -AllMatches |
        Foreach {$_.Matches} |
        Foreach {$_.Groups[1].Value}
Keith Hill
  • 194,368
  • 42
  • 353
  • 369
  • `$folder = Get-ChildItem e:\sk\html\ $out= Select-String -Path $folder -pattern $topic1, $topic2 -AllMatches | foreach {$_.Matches} | foreach {$_.Groups[1].Value} $out | format-table value -auto $out |select *, @{N='Value'; E={$_}}| ConvertTo-csv | Out-file s5.csv -Force`
    How to get ';' betwen $topic1 and $topic2? In this example the script get every matches to every single line in output file. How to get simple column with records?
    – Gandalf Smith Dec 16 '12 at 11:14
  • I mean: simple column with(topic1, topic2, topic3) and records with matched strings to those columns – Gandalf Smith Dec 16 '12 at 11:21
  • The capture groups are numbered and based on the position of `()` in your regexes. The number of the capture group (starting at 1) corresponds to the number in `$_.Groups[n].Value`. Fiddle with `n` and how your using capture groups in your regex. – Keith Hill Dec 16 '12 at 17:05