3

I have a very large log file in which I need to count the occurrences of all the variations of a particular string; that is:

There are a large number of file IDs that appear in the format AA000####. I have to find out what the top five or ten IDs are in this file (which ones appear the most times).

I figure this can be done with select-string and regular expressions?

Doug Chase
  • 753
  • 3
  • 12
  • 22

2 Answers2

4

If you want to break out just the title portion (which I'm guessing you do) and not group based on the whole URL (which could contain information specific to that visit) you need to get the value of the title parameter like so:

get-content "test.txt" | % {if($_ -match 'title=([^\&]+)') {$($Matches[1])}} | group | sort -desc Count
Scott Keck-Warren
  • 1,670
  • 1
  • 14
  • 23
  • This worked GREAT. Thanks!! I learned something today :) – Doug Chase Dec 09 '11 at 23:26
  • One thing to keep in mind is also piping to format-table autosize ( `| ft -autosize` ) in the event of your counts being large enough to where powershell ellipses the output. – psantiago Oct 31 '17 at 13:59
2

This is off the top of my head but you should be able to do this with a one-liner.

You can either shove it in a variable and get the length of that variable like so:

$count = get-content .\test.txt | select-string -pattern "AA000"
$count.length

Or our can just do it all inline by using parens:

(get-content .\test.txt | select-string -pattern "AA000").length

You can do you top count with the group-object cmdlet.

get-content .\test.txt | group-object | export-csv out.csv

That is pretty ugly right now, but you should be able to go from there

Zypher
  • 37,405
  • 5
  • 53
  • 95
  • I need to count how many times a given AA000#### appears in the file. For example, how many times was item PX0001582 requested? What I'm really interested in is which items were requested the most times. – Doug Chase Dec 08 '11 at 20:50