3

So I'm trying to count the words of my text file however when I do get-content the array reads them letter by letter and so it doesn't let me compare them word by word. I hope you guys can help me out!

Clear-Host #Functions

function Get-Articles (){

 foreach($Word in $poem){
    if($Articles -contains $Word){
       $Counter++
    }
}
    write-host "The number of Articles in your sentence: $counter"
}

#Variables

$Counter = 0

$poem = $line
$Articles = "a","an","the"

#Logic

$fileExists = Test-Path "text.txt"

if($fileExists) {
    $poem = Get-Content "text.txt"
    }
else
    {
    Write-Output "The file SamMcGee does not exist"  
    exit(0) 
    }

$poem.Split(" ")

Get-Articles
SkullNerd
  • 57
  • 2
  • 7

2 Answers2

4

What your script does, edited down a bit:

$poem = $line                    # set poem to $null (because $line is undefined)
$Articles = "a","an","the"       # $Articles is an array of strings, ok

                                 # check file exists (I skipped, it's fine)

$poem = Get-Content "text.txt"   # Load content into $poem, 
                                 # also an array of strings, ok

$poem.Split(" ")                 # Apply .Split(" ") to the array.
                                 # Powershell does that once for each line.
                                 # You don't save it with $xyz = 
                                 # so it outputs the words onto the 
                                 # pipeline.
                                 # You see them, but they are thrown away.

Get-Articles                     # Call a function (with no parameters)


function Get-Articles (){        

                                 # Poem wasn't passed in as a parameter, so
 foreach($Word in $poem){        # Pull poem out of the parent scope. 
                                 # Still the original array of lines. unchanged.
                                 # $word will then be _a whole line_.

    if($Articles -contains $Word){    # $articles will never contain a whole line
       $Counter++
    }
}
    write-host "The number of Articles in your sentence: $counter"  # 0 everytime
}

You probably wanted to do $poem = $poem.Split(" ") to make it an array of words instead of lines.

Or you could have passed $poem words into the function with

function Get-Articles ($poem) {
...

Get-Articles $poem.Split(" ")

And you could make use of the PowerShell pipeline with:

$Articles = "a","an","the"

$poemArticles = (Get-Content "text.txt").Split(" ") | Where {$_ -in $Articles}
$counter = $poemArticles | Measure | Select -Expand Count
write-host "The number of Articles in your sentence: $counter"
TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
  • Scriptblock in `Where` will be executed on each word, and since scriptblock invocation overhead is huge in PowerShell coupled with the slowness of pipelining in PS, this is the slowest solution by far. The fastest one is mentioned in a comment: `(Select-String '\b(a|an|the)\b' text.txt -AllMatches).Matches.Count`. The original code in the question is almost as fast provided it's fixed by using `split` on each line or on the entire text content string. – wOxxOm Nov 06 '16 at 03:04
1

TessellatingHeckler's helpful answer explains the problem with your approach well.

Here's a radically simplified version of your command:

$counter = (-split (Get-Content -Raw text.txt) -match '^(a|an|the)$').count
write-host "The number of articles in your sentence: $counter"

The unary form of the -split operator is key here: it splits the input into words by any run of whitespace between words, resulting in an array of individual words.

-match then matches the resulting array of words against a regex that matches words a, an, or the: ^(a|an|the)$.

The result is the filtered subarray of the input array containing only the words of interest, and .count simply returns that subarray's count.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    You'd think `Select-String` would be shorter than -split and get-content and match combined, eh? but no `(Select-String '\b(a|an|the)\b' text.txt -AllMatches).Matches.Count`. And `(Get-Content -Raw text.txt) -replace '.*?\b(a|an|the)\b.*?'|measure -word).Words` is also not shorter. :-/ – TessellatingHeckler Nov 05 '16 at 06:17
  • @TessellatingHeckler: It would be interesting to see how your variations compare in terms of performance, however. – mklement0 Nov 05 '16 at 06:23
  • 1
    Impromptu testing, I just happen to have saved the PoSh help locally earlier, 1.4MB of text. Changing the file selector to `*.txt`, your approach takes 0.5s and finds 20,409 articles, my `select-string` takes 0.35s and finds 20,953 articles, and my -replace takes 5.8s and finds 83,712. Probably discount that last one. But my word boundary regex is possibly finding things like `"the` which your space split would miss. – TessellatingHeckler Nov 05 '16 at 06:38
  • @TessellatingHeckler: Good to know, and good point re `"the`, thanks. `Select-String` is powerful, but you don't see it used that often. – mklement0 Nov 05 '16 at 06:46