I'm trying to extract a domain from the Splunk payload_printable field (source is Suricata logs) and found this regex is working fine for most of the cases:
source="*suricata*" alert.signature="ET JA3*"
| rex field=payload_printable "(?<dom>[a-zA-Z0-9\-\_]{1,}\.[a-zA-Z0-9\-\_]{2,}\.[a-zA-Z0-9\-\_]{2,})"
| table payload_printable, dom
The regular expression is:
(?<dom>[a-zA-Z0-9\-\_]{1,}\.[a-zA-Z0-9\-\_]{2,}\.[a-zA-Z0-9\-\_]{2,})
For example, if my printable_payload looks like this:
...........^aO+.t....]......$.....mT*l.......&.,.+.0./.$.#.(.'.
...........=.<.5./.
...].........activity.windows.com..........
.................
.......................#...........
The domain "activity.windows.com" is successfully extracted. Now, it doesn't work for such a payload, because the regex matches another part that does not correspond to the domain:
...........^aO+]v;.~........:.Y.zORw._I..K>..&.,.+.0./.$.#.(.'.
...........=.<.5./.
...].........activity.windows.com..........
.................
.......................#...........
It extracts "Y.zORw._I".
Another example:
...........^h.'`.o2...
.y.k>..e.ef...]..8.G..&.,.+.0./.$.#.(.'.
...........=.<.5./.
...p.........arc.msn.com..........
.................
.......................#.........h2.http/1.1...................
I don't know how to do. Thank you for your help.