Optional fields when matching log file rows using regex

Question

I'm trying to parse a web log with regular expressions using RegexSerDe. It works by matching each regex group with a column in a table and if the regex group is empty it assigns a null to that column.

I'm having trouble matching log rows with missing fields. There are two kinds of rows in this log:

<134>2016-10-23T23:59:59Z cache-iad2134 fastly[502801]: 52.55.94.131 "-" "-" Sun, 23 Oct 2016 23:59:59 GMT GET /apps/events/2016/10/11/3062653/?REC_ID=3062653&id=0 200

<134>2016-10-23T23:59:59Z cache-dfw1835 fastly[502801]: 1477267199

I wrote the below regex that matches the first type of row with all fields:

^(\\S+) (\\S+) (\\S+) (\\S+) "(\\S+)" "(\\S+)" (.*) (\\d{3})

But I played around with ? to get the regex to optionally ignore the fields after the first 4 but kept messing up the columns.

Any suggestions on how I should add the ? without changing the number of groups (so that the deserializer doesn't cough up)? Or any other way to do this you would suggest?

Since you haven't shown the regexp with the optional modifiers, how are we supposed to tell you what you did wrong? The only thing I can think of is that you forgot to make the spaces between the fields optional as well. — Barmar, Oct 28 '16 at 01:27

score 1 · Answer 1 · answered Oct 28 '16 at 01:30

1

Put a non-capturing group around all the fields after the first 4, and make it optional.

^(\\S+) (\\S+) (\\S+) (\\S+)(?: "(\\S+)" "(\\S+)" (.*) (\\d{3}))?

Putting ?: at the beginning of a group makes it non-capturing. So this group doesn't affect the number of groups that are captured.

answered Oct 28 '16 at 01:30

Barmar

741,623
53
500
612

Thank you so much Barmar. That worked like a charm. I really appreciate your help. – Mete Kural Oct 28 '16 at 16:18

Optional fields when matching log file rows using regex

1 Answers1