1

I am using this simple regex for my plain text files:

\([A-Za-z]*\)

I get matches like:

...(test)..., ...(Next)..., ...(Warning)... and so on

Is it possible to exclude matches which start with #? For example:

 #  Caught Stealing from Something (Warning)   (3300-3599)

I have tried a lot and came up with this:

(?:^\s*?#.*)|(\([A-Za-z0-9\s\-]*\))

Now i need to display only the matches of 1st capturing group, but i don't know how. Can someone give a tip?

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
FalloutBoy
  • 964
  • 1
  • 9
  • 22
  • Perl regex? Do you have grep? I'd break the problem in two: `grep FILE -Pe "^[^#]" | grep -shoPe "\(\w+\)"` – Alex Jul 05 '15 at 20:16
  • Don't know, i am working with eclipse on simple txt files. I think i can't use your solution in eclipse... – FalloutBoy Jul 05 '15 at 22:06

1 Answers1

1

This is not as straight-forward as it looks.

Normally you'd use lookarounds for something like this, but your case is pretty unorthodox. You want to get non-line related results (as any line can have any number of results) with a line related requirement (the absence of #).

Lookarounds will normally be line-specific and only have one result.

My solution:

\s*?#.*(?:\n|$)|(\([A-Za-z0-9\s\-]*\))

DEMO

You won't avoid having your results within capture groups I'm afraid. In this case, all of your results will be $1. The idea is that you describe the possibility of a commented line, and if it's not met, you describe how items you're interested could look. Since the commented line option is first, the items within it won't be tested

I also changed your regex to also contain spaces, dashes and numbers to have more cases of (content123) and alike. If you don't want that, discard the changes and use \s*?#.*(?:\n|$)|(\([A-Za-z]*\))

There are definitely some other ways to do this, but this is the one i had most success with.

Andris Leduskrasts
  • 1,210
  • 7
  • 16
  • This is actually pretty nice. Could you just explain what do you need the `|.` for? It seems to me that without it you will make a lot less matches, but still you will have all the matches that contain non-empty value for the $1 group. – ndnenkov Jul 05 '15 at 21:09
  • It matches but doesn't capture (because of `?:`), but you're right, you actually don't need it the way I wrote it anymore. I used it to describe the character jumps but the regex tries that on its own because I didn't specify the start or end of the line or string. – Andris Leduskrasts Jul 05 '15 at 21:13
  • Unhappily it doesn't work and i don't understand what you are saying, i can only build simple regex. Your solution matches also lines which starts with #, but i want completely ignore these lines. I only need matches which have one word in (), like (text) – FalloutBoy Jul 05 '15 at 22:11
  • Yes, that's intended. You need to use what's called **capture groups** and you need to call them differently, normally with $1. What text redactor/tool are you using? There isn't a solution where just matches will give you the wanted answer. – Andris Leduskrasts Jul 05 '15 at 23:08
  • ah ok thanks Andris. Iam using eclipse. How would the expression look like with capture groups or isn't it possible with eclipse? – FalloutBoy Jul 06 '15 at 06:28
  • It is, i'm browsing some SO questions and there are some examples, [this](http://stackoverflow.com/a/1372766/4635642) for instance. So it is really `$1`, but you have to fiddle around and test it a bit to understand how to achieve your goal. – Andris Leduskrasts Jul 06 '15 at 06:31
  • ok, i understand. I've tested it in eclipse and i can use $1 for the current match, but this doesn't help me. I don't know how to use it for my solution, can you help me please? `(\([A-Za-z]*\))[^\^#]` – FalloutBoy Jul 08 '15 at 11:24