Return strings within parameters sed/grep/awk/gawk

Question

Need some help to return all data in a log file within 2 specific delimiters. We usually have logs like the one below:

2018-04-17 03:59:29,243 TRACE [xml] This is just a test.
2018-04-17 13:22:24,230 INFO [properties] I believe this is another test.
2018-04-18 03:48:07,043 ERROR [properties] (Thread-13) UpdateType: more data coming here; ProcessId: 5010
2018-04-17 13:22:24,230 INFO [log] I need to retrieve this string here
and also this one as it is part of the same text
2018-04-17 13:22:24,230 INFO [det] I believe this is another test.

If I grep "here" I just get the line including the word but I actually need to retrieve the whole text, the breaks are probably contributing to my problem also.

2018-04-17 13:22:24,230 INFO [log] I need to retrieve this string here
and also this one as it is part of the same text

We could have several "here" within the log file. I tried to do it through sed but I can't find the right way to use the delimiters which I think should be the whole DATE.

I really appreciate your help on this.

New example after Karakfa's comments

2018-04-17 03:48:07,044 INFO  [passpoint-logger] (Thread-19) ERFG|1.0||ID:414d512049584450414153541541871985165165130312020203aa4b|Thread-19|||2018-04-17 03:48:07|out-1||out-1|
2018-04-17 03:59:29,243 TRACE [xml] (Thread-19) RAW MED XML: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><MED:MED_PMT_Tmp_Notif xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://services.xxx.com/POQ/v01" xmlns:POQ="http://services.xxx.com/POQ/v01" xmlns:MED="http://services.xxx.com/MED/v1.2" version="1.2.3" messageID="15290140135778972043" Updat584ype="PGML" xsi:schemaLocation="http://services.xxx.com/MED/v1.2 MED_PMT_v.1.2.3.xsd">
    <MED_Space xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01" status="AVAIL" dest="MQX" aircraftType="DH8" aircraftConfig="120">
        <Space_ID partition="584" orig="ADD3" messageCreate="2018-04-17T03:59:29.202-05:00">
            <Space carrier="584" date="2018-04-18">0108</Space>
        </Space_ID>
        <DepartAndArrive estDep="2018-04-18T18:10:00+03:00" schedDep="2018-04-18T18:10:00+03:00" estArrival="2018-04-18T19:30:00+03:00" schedArrival="2018-04-18T19:30:00+03:00"/>
        <Sched_OandD orig="ADD3" dest="MQX"/>
    </MED_Space>
    <TRX_Record xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01">
        <TRX_ID FILCreate="2018-04-17T03:59:00-05:00" resID="1">TFRSVL</TRX_ID>
        <Space>
            <Inds revenue="1"/>
            <Identification nameID="1" dHS_ID="TFRSVL001" gender="X">
                <Name_First>SMITH MR</Name_First>
                <Name_Last>P584ER</Name_Last>
                <TT tier="0"/>
            </Identification>
                <TRXType>F</TRXType>
            <SRiuyx>0</SRiuyx>
            <GroupRes>1</GroupRes>
            <SystemInstances inventory="H">Y</SystemInstances>
            <OandD_FIL orig="ADD3" dest="MQX"/>
            <Store="584">0108</Store>
            <CodingSpec="584">0108</CodingSpec>
        </Space>
    </TRX_Record>
        <ns2:TRX_Count xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01">1</ns2:TRX_Count>
    <ns2:Transaction_D584ails xmlns:ns2="http://services.xxx.com/MED/v1.2" xmlns:ns4="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns3="http://services.xxx.com/POQ_Header/v01" sourceID="TPF">
        <Client_Entry_Info authRSX="54" agx="S4" code="ADD3">RESTORE AMEND:NEW-FIL/AFAX-UPDATED</Client_Entry_Info>
    </ns2:Transaction_D584ails>
</MED:MED_PMT_Tmp_Notif>
2018-04-17 03:59:29,244 INFO  [properties] (Thread-19) Updat584ype: PGML ; ProcessId: ##MISSING##

The entry below is not returning the whole text: awk -v RS='(^|\n)[0-9 :,-]+' '/TFRSVL/{print rs,$0} {rs=RT}' file

Please be more clear, your question is not clear. Be clear in your expected sample output and show us too along with your attempt too. — RavinderSingh13, May 03 '18 at 15:11
I think the expected result is pretty clear... I would like to retrieve the whole line containing the word I'm looking for, even if there are break lines, I thought of using INFO|ERROR|TRACE as delimiters or probably the date since I would need the date as well but I can't figure it out. The sed entries I tried did not work :( — TerminatorX, May 03 '18 at 15:18
When you want to search a string and want the complete line from it, then why not you simply use `grep`? — RavinderSingh13, May 03 '18 at 15:19
I've started using grep but it was not working in case of break lines, it was not returning the whole text. grep -A/B are not an option because the lines are variable. — TerminatorX, May 03 '18 at 15:28
Stack Overflow is not a code writing service. Please show your code. Since Stack Overflow hides the Close reason from you: *Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/).* — jww, May 04 '18 at 01:46
@jww why did you downvote every answer again? Are you trying to teach us a lesson that we shouldn't answer a question unless your lordship approves of the question? — Ed Morton, May 04 '18 at 13:31

score 1 · Accepted Answer · answered May 03 '18 at 15:20

1

with GNU awk multi-char record separator

$ awk -v RS='(^|\n)[0-9 :,-]+' '/here/{print rs,$0} {rs=RT}' file

2018-04-18 03:48:07,043  ERROR [properties] (Thread-13) UpdateType: more data coming here; ProcessId: 5010

2018-04-17 13:22:24,230  INFO [log] I need to retrieve this string here
and also this one as it is part of the same text

NB Here I cheated by creating the record separator that uses the values in the time stamp. You can formulate it exactly to eliminate false positives ending up on the start of the second line. Or, perhaps add the debug levels to the match as well.

answered May 03 '18 at 15:20

karakfa

66,216
7
41
56

this is pretty good but I'm still getting errors with the original log for some reason. I'm editing the description. – TerminatorX May 03 '18 at 16:11
It is working for me now. I have removed also the space in the regex "[0-9:,-]" (after the 9) and it's working as expected. Would you be able to help me color the string? I've been trying to include "\033[32m" in the command but I'm coloring the whole text or getting fatal errors. – TerminatorX May 08 '18 at 14:12

Ed Morton · Answer 2 · 2018-05-03T15:58:33.767

Assuming every record starts with a timestamp then a string of all upper case letters then another string within square brackets:

$ cat tst.awk
/^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{3} [[:upper:]]+ \[[^][]+\] / { prt() }
{ rec = (rec=="" ? "" : rec ORS) $0 }
END { prt() }

function prt() {
    if (rec ~ regexp) {
        print rec
        print "----"
    }
    rec = ""
}

$ awk -v regexp='here' -f tst.awk file
2018-04-18 03:48:07,043 ERROR [properties] (Thread-13) UpdateType: more data coming here; ProcessId: 5010
----
2018-04-17 13:22:24,230 INFO [log] I need to retrieve this string here
and also this one as it is part of the same text
----

You can change the starting regexp to something else if that's not restrictive enough, e.g. if the text within a record ends up with a string matching that same regexp at the start of a subsequent line (though I don't know how you'd actually deal with that given what you've shown us so far).

Also, think about what this is doing:

$ cat tst.awk
/^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{3} [[:upper:]]+ \[[^][]+\] / { prt() }
{ rec = (rec=="" ? "" : rec ORS) $0 }
END { prt() }

function prt(   flds,recDate,recTime,recPrio,recType,recText) {
    split(rec,flds)
    recDate = flds[1]
    recTime = flds[2]
    recPrio = flds[3]
    recType = flds[4]
    gsub(/[][]/,"",recType)
    recText = rec
    sub(/([^[:space:]]+ ){4}/,"",recText)
    gsub(/[[:space:]]+/," ",recText)

    if (NR > 1) {
        if ( date=="" || date==recDate ) {
            printf "date = <%s>\n", recDate
            printf "time = <%s>\n", recTime
            printf "prio = <%s>\n", recPrio
            printf "type = <%s>\n", recType
            printf "text = <%s>\n", recText
            print "----"
        }
    }
    rec = ""
}

.

$ awk -v date='2018-04-18' -f tst.awk file
date = <2018-04-18>
time = <03:48:07,043>
prio = <ERROR>
type = <properties>
text = <(Thread-13) UpdateType: more data coming here; ProcessId: 5010>
----

.

$ awk -f tst.awk file
date = <2018-04-17>
time = <03:59:29,243>
prio = <TRACE>
type = <xml>
text = <This is just a test.>
----
date = <2018-04-17>
time = <13:22:24,230>
prio = <INFO>
type = <properties>
text = <I believe this is another test.>
----
date = <2018-04-18>
time = <03:48:07,043>
prio = <ERROR>
type = <properties>
text = <(Thread-13) UpdateType: more data coming here; ProcessId: 5010>
----
date = <2018-04-17>
time = <13:22:24,230>
prio = <INFO>
type = <log>
text = <I need to retrieve this string here and also this one as it is part of the same text>
----
date = <2018-04-17>
time = <13:22:24,230>
prio = <INFO>
type = <det>
text = <I believe this is another test.>
----

and imagine how you can easily create precise queries on specific fields of your log records using that approach, generate CSVs for import to Excel, etc, etc...

the results are good, anyway to make this without having to create the tst.awk file? I'm asking because I would need to be SSH through different servers and I don't want to be creating the file on each server. — TerminatorX, May 03 '18 at 16:22
Of course. Just use `awk 'script' inputfile` instead of `awk -f scriptfile inputfile`. See the top of the awk man page. — Ed Morton, May 03 '18 at 19:31

Return strings within parameters sed/grep/awk/gawk

2 Answers2