3

I have an simple myscript.ps1 to extract URLs from files, taken from this tutorial:

$input_path = 'd:\myfolder\*'
$output_file = 'd:\extracted_URL_addresses.txt'
$regex = '([a-zA-Z]{3,})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)*?'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

I run PowerShell as administrator and then type:

D:/myscript.ps1

But for most of the paths inside d:\myfolder I get:

select-string : The file D:\myfolder\templates cannot be read: Access to the path 'D:\myfolder\templates' is denied.

The folder is copied from FTP server with WinSCP. I tried to go to folder properties and tick off "read only" box than apply but each time I re-enter the properties it's "read only" again (I'm not sure if that's related to the problem).

I work on Windows 10.

PolGraphic
  • 3,233
  • 11
  • 51
  • 108
  • 3
    Looks like `D:\myfolder\templates` is a folder not a file Select-String can work with. –  May 25 '17 at 16:18
  • Are you able to browse to see files in D:\myfolder\templates, and if you see files there are you able to open them? This sounds like an ACL issue to me. – TheMadTechnician May 25 '17 at 16:22
  • @TheMadTechnician yes, I am able both to open and browse those folders without any issues. – PolGraphic May 28 '17 at 15:28
  • I don't know of the contents of your folders if any. Add the `-File` parameter to the gci and/or supply an extension and you are done. –  May 28 '17 at 15:34

2 Answers2

0

To expand on the comment from @LotPings You could get just the files in D:\myfolder by using -File parameter from Get-ChildItem. This way you will not pass in a directory to Select-String.

$input_path = 'd:\myfolder'
$Files = Get-ChildItem $input_path -File | Select-Object -ExpandProperty FullName
Foreach ($File in $Files) {
    $output_file = 'd:\extracted_URL_addresses.txt'
    $regex = '([a-zA-Z]{3,})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)*?'
    select-string -Path $file -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file
}
BenH
  • 9,766
  • 1
  • 22
  • 35
0
  • As $input is an automatic variable I wouldn't use it - not even as part of a variable name.
  • You don't need two stacked ForEach-Object use $_.Matches.Values instead
  • Using a file extension in the path could eventually avoid the error
  • Using the folllowing script on a copy of this webpage works flawlessly but has a lot of dupes, so I'd append a |Sort-Object -Unique

$FilePath = '.\*.html'
$OutputFile = '.\extracted_URL_addresses.txt'
$regex = '([a-zA-Z]{3,})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)*?'
Get-ChildItem -File $FilePath |
  Select-String -Pattern $regex -AllMatches | 
    ForEach-Object { $_.Matches.Value } |Sort -Unique > $OutputFile