0

I want to search a text file for two strings. The output will only be printed if the first string is greater than 8 characters.

Here is the command I am trying to run:

Get-Content -Path .\std_server*.out | Select-String '((if "cpu=" -gt 8)|application=")' | out-File  -width 1024 .\test.txt

So I want to search file std_server*.out for both values CPU and APPLICATION, but I only want to print these values if the value for CPU is greater than 8 characters.

How do I do that?

Currently, I have a basic version that works with '(cpu=|application=")', however that prints out all values of CPU, and I only want the application to be printed out when CPU is an unreasonably high value (CPU > 8).

Thanks in advance.

Matt
  • 45,022
  • 8
  • 78
  • 119
Antóin
  • 1
  • 3
  • how is your text file looks like? show some example – Avshalom Jan 28 '16 at 12:32
  • Need to see sample source that shows both cases. – Matt Jan 28 '16 at 13:07
  • Does CPU=something exist on its own line? `'^(cpu=.{8,}$|application=")`? That would match lines that start with cpu= and has at least 8 characters following the equal... or anyline starting with application=" – Matt Jan 28 '16 at 13:18
  • Hi Matt. I have opened a new thread. I think the new thread should be much clearer. http://stackoverflow.com/questions/35221414/match-select-string-of-11-characters-and-also-starting-after-a-certain-point-i – Antóin Feb 05 '16 at 10:13

2 Answers2

0

If they're on their own lines could use something like

Get-Content -Path .\std_server*.out | ?{($_.StartsWith("CPU=") -And $_.Length -gt 12) -Or $_.StartsWith("Application=")} | out-File  -width 1024 .\test.txt

Or you could check for an end character (semicolon for example) like

Get-Content -Path .\std_server*.out | ?{(($_.StartsWith("CPU=") -And $_.Length -gt 12) -Or $_.StartsWith("Application=")) -And $_.EndsWith(";")} | out-File  -width 1024 .\test.txt

Or just use a regular expression, like

Get-Content -Path .\std_server*.out | Select-String '^(cpu=\d{8,}|application=\d*)$' | out-File  -width 1024 .\test.txt

if they're on their own lines, or

Get-Content -Path .\std_server*.out | Select-String '(cpu=\d{8,}|application=\d*);' | out-File  -width 1024 .\test.txt

if they're delimited by a character (in this case ';').

Silvius
  • 1
  • 2
  • 1
    Op did not mention anything about semicolons and your regex solution is missing the application portion of the match. I don't think you need to escape the semi colon either. – Matt Jan 28 '16 at 13:28
  • In general though you non regex approaches are nice. – Matt Jan 28 '16 at 13:47
  • Thank you! I added the semicolon just as an example of a delimiting character, in case the values weren't each on their own lines. Also, you're right about the regex; i edited the answer. – Silvius Jan 28 '16 at 14:52
  • Thanks for the replies guys. Here is a section of the file I am trying to search: – Antóin Jan 29 '16 at 14:00
  • cpu=191362359.38 [reset 191362359.38] ms elapsed=1288865.05 [reset 1288865.05] s allocated=86688238148864 B (78.84 TB) [reset 86688238148864 B (78.84 TB)] defined_classes=468 io= file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 [reset file i/o: 588014/275091 B, net i/o: 36449/41265 B, files opened:19, socks opened:0 ] user="Guest" application="JavaEE/ResetPassword" tid=0x0000000047a8b000 nid=0x1b10 / 6928 runnable [_thread_blocked (_call_back), stack(0x0000000070de0000,0x0000000070fe0000)] [0x0000000070fdd000] java.lang.Thread.State: RUNNABLE – Antóin Jan 29 '16 at 14:02
  • cpu=191362359.38 is 13 characters, so I think I should actually be searching for cpu > 11. When this is true, I also want to print out the application name that is causing the high CPU. If cpu < 11, its not an issue, so in that case I am not interested in that application.. – Antóin Jan 29 '16 at 14:17
  • Ok, but where is the application name in the file, with regards to the cpu value? It would be useful if you could post a section of the file containing 2 application entries. You can edit out the other unimportant stuff ('defined_classes', etc.) if it's too long, just keep the application name and the cpu value. It would give us a better idea of how the information is structured in the file. – Silvius Jan 29 '16 at 15:24
  • Hi Silvius. You can see the application at the beginning of the 5th line: application="JavaEE/ResetPassword" – Antóin Feb 03 '16 at 07:15
0

That nested logic with the if won't work as you have seen. You need a quantifier for the characters that match after cpu= in order to define the conditional match there. You could measure the match with some post processing as well but it might create more headache since you have to work around the application=" matches as well.

Presumably your file will have those string at the start of the line and nothing else follows them? To ensure the correct matches it would be a good idea to use anchors.

Also you might as well use Export-CSV with the right properties since Select-String return matches objects.

$pattern = '^(cpu=.{8,}|application=".*)$'
Get-Content -Path .\std_server*.out | 
        Select-string -Path  c:\temp\text.txt -Pattern $pattern | 
        Select-Object Path,LineNumber,Line | 
        Export-CSV -NoTypeInformation .\test.txt

cpu=.{8,} will match "cpu=" literally and then at least 8 characters have to follow it for a match. We use anchors to ensure from the start to the end of the matches is exactly what we want and nothing more.

You first and last sentences conflict for me but it is possible that the whole match is supposed to be 8 characters so perhaps you just want the number 4.

Matt
  • 45,022
  • 8
  • 78
  • 119
  • I have decided to change my logic due to the layout of the file I am searching. Unfortunately, in the file, 'application=' is always on the 3rd line after 'cpu=', and I want to make this as basic as possible, so I am wondering if I can set the select-string to be performed only after the first occurrence of the string 'Java Thread Dump'? So the values for 'cpu=' and 'application=' are only printed out after the first occurrence of 'Java Thread Dump'. That would massively reduce the output. Sorry for change of direction. Thanks for your help so far :) – Antóin Feb 03 '16 at 10:35
  • For example, the file I am searching is 1057465 lines long, however, the first thread dump I triggered occurred at line 1013169, so if I could get the search to start at 'Java Thread Dump' I would filter out 95% of the file straight away – Antóin Feb 03 '16 at 10:50
  • How do you propose starting the search at Java Thread Dump? It would still have to read the whole file up until that point to get that information. A file a million lines long would have been nice to know in the question as that is an enormous file to search and warrants different search methods. The way I see you have to answers to your current question. You need to look into the logic here but using a streamreader. – Matt Feb 03 '16 at 11:38
  • Hi @Matt. I understand that it would have to read up to the point that 'Java Thread Dump' appears, but my hope would be that it would only start printing to file from that point also, thereby reducing my output by nearly 95%. As I said, I want to keep the command as basic as possible. So anyway to restrict it printing out to file until it has encountered the line 'Java Thread Dump'? – Antóin Feb 03 '16 at 14:05
  • @Antóin I am confused now. The code I have above will print the lines that match what you asked for. Is it returning too much is the problem? What you are asking can be done but again, you have changed the scope of the question with this new information. I would recommend asking a new one referencing this one. – Matt Feb 03 '16 at 14:20
  • OK. Thanks Matt. I will do as you recommend. Thanks a lot for your help :) – Antóin Feb 04 '16 at 13:01