1

I'm using StreamSets to parse a Log file, the problem that StreamSets parses line by line and my log record is multiple lines, something like this

00:01:03.930 [WebContainer : 41] Outbound message:
00:01:03.930 [WebContainer : 41] Values to hide NewPassword -- mask -- .+

I tried regex and grok patterns, but the new line tag doesn't work for me. So, how to make StreamSets parses the record as multiple lines?

Mohamed Seif
  • 382
  • 1
  • 3
  • 14
  • I can't answer your question — unless you're willing to use Python and pyparsing — but I'd appreciate if you could post a bigger sample of your log file that I could experiment with. Thank you! – Bill Bell Jul 09 '17 at 20:09
  • Are they always in pairs? How do you reliably group them? – metadaddy Jul 09 '17 at 23:57
  • In the File Tail component, Data Format tab, there's an area says "Pattern for Multiline", it's says in help "Regex pattern to detect main lines of text and log files with multiline elements", but i don't know how to put it in regex, as i have to fill the main regular expression too. How could i use this tag? – Mohamed Seif Jul 11 '17 at 08:49

2 Answers2

1

I created custom processor to parse my file. I followed this tutorial and worked just fine and like i wanted https://github.com/streamsets/tutorials/tree/master/tutorial-origin

Mohamed Seif
  • 382
  • 1
  • 3
  • 14
0

I would try using a Javascript evaluator (processor)

and write the below code (to handle multiple lines and to consider as a single record)

for(var i = 0; i < records.length; i++) {
  try {
    items = records[i].value['items']

<write your logic here to consider multiple lines >


    }

  } catch (e) {
    // Send record to error
    error.write(records[i], e);
  }
}
Anandkumar
  • 1,338
  • 13
  • 15