-4

I am tokenizing a string such as:

BAS=W34 N29 E24 S29$FOP=E6 S6 W6 N6$. A Comment

The period is the "end of command" character, not a "beginning of comment" character. How can I add a regular expression rule to the lexer such that the period is a token unto itself, but anything after the period is a token with type COMMENT? I tried /\..+$/, but that includes the period in the comment.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
Chet
  • 3,461
  • 1
  • 19
  • 24
  • What program are you using? – melpomene Oct 21 '15 at 19:14
  • How about not period `(?<=[.])[^.]+$` –  Oct 21 '15 at 19:15
  • I wrote the lexer myself in C#, it uses .NET regular expressions. It is a lot like http://blogs.msdn.com/b/drew/archive/2009/12/31/a-simple-lexer-in-c-that-uses-regular-expressions.aspx (see Implemention section) – Chet Oct 21 '15 at 19:16
  • This will do the job: `(?<=\.).+` – Lucas Trzesniewski Oct 21 '15 at 19:19
  • Using a capturing group would be more efficient, but I am not sure if you can do it with your lexer. E.g. [`^[^.]*\.\s*([^.]*)$`](https://regex101.com/r/bA1mT6/1). It is much easier without regex in C#: `s.Split('.')[1].Trim()`. – Wiktor Stribiżew Oct 21 '15 at 19:25
  • @stribizhev Guess my lexer can't do it, because that returned my entire string. – Chet Oct 21 '15 at 19:28
  • @LucasTrzesniewski Thanks, that did it. – Chet Oct 21 '15 at 19:28
  • Then your lexer will not be effecient. I would add a group extraction if I were in your shoes. It takes `(?<=\.).+$` 79 steps to match the string, and it takes just 9 steps for my above regex to get it. Look-behinds are costly. – Wiktor Stribiżew Oct 21 '15 at 19:30
  • I wonder why ny question has been downvoted. – Chet Oct 22 '15 at 13:20

2 Answers2

2

You might could try 2 ways, a non-capturing group, or a look-behind (which is also non-capturing):

(?:\.).+$

(?<=\.).+$
Jeff Y
  • 2,437
  • 1
  • 11
  • 18
0

This is hard to answer without knowing the actual tools involved, but consider inverting the logic: use a regex such as /^.+\./ to detect a command, and everything after is a COMMENT.

davir
  • 912
  • 5
  • 7