-2

I am trying to parse the Twitch IRC chat into a more readable way. I have never used Regex and am not sure how to go about this (even after reading tons of tutorials.)

This is the raw output:

:nick!nick@nick.tmi.twitch.tv PRIVMSG channel :

I would like two regex's to parse the nick and message to be used individually, thanks!

Johnny Johnson
  • 89
  • 1
  • 1
  • 6
  • A hint: regexps are not always the solution. IRC protocol is much simpler to parse without. – Sami Kuhmonen Sep 23 '15 at 06:15
  • What have you tried? IRC takes parameters divided by spaces, so why not split on `" "`? – Mariano Sep 23 '15 at 06:17
  • @Mariano when I split with a space, it removes the spaces from the message, too – Johnny Johnson Sep 23 '15 at 06:32
  • Get nick ident and host from 1st item, command from 2nd, target from 3rd, join the rest to get the message – Mariano Sep 23 '15 at 06:33
  • @Mariano I did split with an exclamation mark for all PRIVMSG's and it gets the username but after it writes the username, it also writes the original line with just the exclamation mark missing? – Johnny Johnson Sep 23 '15 at 06:48
  • Why don't you use some library instead of inventing a new type of a wheel. Quick googling showed me this: https://www.nuget.org/packages/IrcDotNet/. If you want to learn on how to parse IRC protocol than there are better ways than regexp – nomail Sep 23 '15 at 06:51

1 Answers1

3

Regex is not your solution for this problem. If you really want to go down this road (but don't - keep reading!), then you can use something like this for the entire message:

:(?<nick>[^ ]+?)\!(?<user>[^ ]+?)@(?<host>[^ ]+?) PRIVMSG (?<target>[^ ]+?) :(?<message>.*)

There's capture groups defined on the nick, username, hostname, channel, and message. I've not tested that, and it'll fail miserably on pretty much every other IRC event, and there will be ways to break it or get around the matching as it's the wrong sort of grammar tool for IRC: it's like hammering in nails with a screwdriver - while it works some of the time, it's harder than it needs to be, and can be made to work better with a lot of time, effort, and pain; why would you not use a hammer?

A much better solution is to simply parse the message. The IRC specs in RFC1459 and RFC2812 give some pretty useful hints here. My advice from experience is to split on " :" (space then colon) - this is the last parameter of the message, then split the first half by spaces. If the first entry in your list starts with a space, split it again by ! and @ to get the parts of the nickname/username/hostname tuple. Follow this method, and you'll have the base to a much more robust and extensible parser than one you could ever build using regular expressions.

If you're doing this as a learning exercise, great! If not, you probably want to consider using a pre-built library to handle all the IRC communication for you.

stwalkerster
  • 1,646
  • 1
  • 20
  • 30