Exclude Matches which start with '#'

Question

I am using this simple regex for my plain text files:

\([A-Za-z]*\)

I get matches like:

...(test)..., ...(Next)..., ...(Warning)... and so on

Is it possible to exclude matches which start with #? For example:

 #  Caught Stealing from Something (Warning)   (3300-3599)

I have tried a lot and came up with this:

(?:^\s*?#.*)|(\([A-Za-z0-9\s\-]*\))

Now i need to display only the matches of 1st capturing group, but i don't know how. Can someone give a tip?

Perl regex? Do you have grep? I'd break the problem in two: `grep FILE -Pe "^[^#]" | grep -shoPe "$\w+$"` — Alex, Jul 05 '15 at 20:16
Don't know, i am working with eclipse on simple txt files. I think i can't use your solution in eclipse... — FalloutBoy, Jul 05 '15 at 22:06

Andris Leduskrasts · Answer 1 · 2015-07-05T21:16:42.777

1

This is not as straight-forward as it looks.

Normally you'd use lookarounds for something like this, but your case is pretty unorthodox. You want to get non-line related results (as any line can have any number of results) with a line related requirement (the absence of #).

Lookarounds will normally be line-specific and only have one result.

My solution:

\s*?#.*(?:\n|$)|($[A-Za-z0-9\s\-]*$)

DEMO

You won't avoid having your results within capture groups I'm afraid. In this case, all of your results will be $1. The idea is that you describe the possibility of a commented line, and if it's not met, you describe how items you're interested could look. Since the commented line option is first, the items within it won't be tested

I also changed your regex to also contain spaces, dashes and numbers to have more cases of (content123) and alike. If you don't want that, discard the changes and use \s*?#.*(?:\n|$)|($[A-Za-z]*$)

There are definitely some other ways to do this, but this is the one i had most success with.

edited Jul 05 '15 at 21:16

answered Jul 05 '15 at 20:53

Andris Leduskrasts

1,210
7
16

This is actually pretty nice. Could you just explain what do you need the `|.` for? It seems to me that without it you will make a lot less matches, but still you will have all the matches that contain non-empty value for the $1 group. – ndnenkov Jul 05 '15 at 21:09
It matches but doesn't capture (because of `?:`), but you're right, you actually don't need it the way I wrote it anymore. I used it to describe the character jumps but the regex tries that on its own because I didn't specify the start or end of the line or string. – Andris Leduskrasts Jul 05 '15 at 21:13
Unhappily it doesn't work and i don't understand what you are saying, i can only build simple regex. Your solution matches also lines which starts with #, but i want completely ignore these lines. I only need matches which have one word in (), like (text) – FalloutBoy Jul 05 '15 at 22:11
Yes, that's intended. You need to use what's called **capture groups** and you need to call them differently, normally with $1. What text redactor/tool are you using? There isn't a solution where just matches will give you the wanted answer. – Andris Leduskrasts Jul 05 '15 at 23:08
ah ok thanks Andris. Iam using eclipse. How would the expression look like with capture groups or isn't it possible with eclipse? – FalloutBoy Jul 06 '15 at 06:28
It is, i'm browsing some SO questions and there are some examples, [this](http://stackoverflow.com/a/1372766/4635642) for instance. So it is really `$1`, but you have to fiddle around and test it a bit to understand how to achieve your goal. – Andris Leduskrasts Jul 06 '15 at 06:31
ok, i understand. I've tested it in eclipse and i can use $1 for the current match, but this doesn't help me. I don't know how to use it for my solution, can you help me please? `($[A-Za-z]*$)[^\^#]` – FalloutBoy Jul 08 '15 at 11:24

Exclude Matches which start with '#'

1 Answers1