0

I would like to search through a file (std_serverX.out) for a value of string cpu= that is 11 characters or greater. This file can contain anywhere up to or exceeding 1 Million lines.

To restrict the search further, I would like the search for cpu= to start after the first occurrence of the string Java Thread Dump has been found. In my source file, the string Java Thread Dump does not begin until approximately the line # 1013169, of a file 1057465 lines long, so therefore 96% of what precedes Java Thread Dump is unnecessary..

Here is a section of the file that I would like to search:

cpu=191362359.38 [reset 191362359.38] ms elapsed=1288865.05 [reset 1288865.05] s allocated=86688238148864 B (78.84 TB) [reset 86688238148864 B (78.84 TB)] defined_classes=468 
io= file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 [reset file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 ] 
user="Guest" application="JavaEE/ResetPassword" tid=0x0000000047a8b000 nid=0x1b10 / 6928 runnable [_thread_blocked (_call_back), stack(0x0000000070de0000,0x0000000070fe0000)] [0x0000000070fdd000] java.lang.Thread.State: RUNNABLE

Above, you can see that cpu=191362359.38 is 12 characters long (including full stop and 2 decimal places). How do I match it so that values of cpu= smaller than 11 characters are ignored and not printed to file?

Here is what I have so far:

Get-Content -Path .\std_server*.out | Select-String '(cpu=)' | out-File  -width 1024 .\output.txt

I have stripped my command down to its absolute basics so I do not get confused by other search requirements.

Also, I want this command to be as basic as possible that it can be run in one command-line in Powershell, if possible. So no advanced scripts or defined variables, if we can avoid it... :)

This is related to a previous message I opened which got complicated by my not defining precisely my requirements.

Thanks in advance for your help.

Antóin

Community
  • 1
  • 1
Antóin
  • 1
  • 3

2 Answers2

0

regex to look for 9 digits followed by a literal . followed by 1 or more digits. all one line

Get-Content -Path .\std_server*.out | 
 Select-String -Pattern 'cpu=\d{9}\.\d+' -AllMatches | 
  Select-Object -ExpandProperty matches  | 
    Select-Object -ExpandProperty value
Kiran Reddy
  • 2,836
  • 2
  • 16
  • 20
0

It can certainly be done, but piping a million lines, the first 96% of which you know has no relevance is not going to be very fast/efficient.

A faster approach would be to use a StreamReader and just skip over the lines until the Java Thread Dump string is found:

$CPULines = @()

foreach($file in Get-Item .\std_server*.out)
{

    # Create stream reader from file
    $Reader = New-Object -TypeName 'System.IO.StreamReader' -ArgumentList $file.FullName
    $JTDFound = $false

    # Read file line by line
    while(($line = $Reader.ReadLine()))
    {
        # Keep looking until 'Java Thread Dump' is found 
        if(-not $JTDFound)
        {
            $JTDFound = $line.Contains('Java Thread Dump')
        }
        else
        {
            # Then, if a value matching your description is found, add that line to our results
            if($line -match '^cpu=([\d\.]{11,})\s')
            {
                $CPULines += $line
            }
        }
    }

    # dispose of the stream reader
    $Reader.Dispose()
}

# Write output to file
$CPULines |Out-File .\output.txt
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Hi Matthias. It is quite slow, I admit :) Unfortunately, when I try to run your logic, I get a blank file printed out :( executionpolicy = RemoteSigned. Saved it as a ps1 file and launched it in PS as: & "directory name\AnalyzeCPU.ps1" – Antóin Feb 05 '16 at 12:05
  • And you are absolutely sure that the string `Java Thread Dump` (exactly like that) is found in a line in the file, before the occurrence of cpu=... ? – Mathias R. Jessen Feb 05 '16 at 12:07
  • Here is how it appears in the file. ================================================================================ Java Thread Dump : Mon Jan 25 11:20:06 2016 ================================================================================ – Antóin Feb 05 '16 at 12:23