Regex to capture everything except the text that is coherent

Question

I have this string and other ones like it:

<a href='/webapps/alrn-atomiclearning-bb_bb60/atomic/view.jsp?courseId=@X@course.pk_string@X@&contentId=@X@content.pk_string@X@&tt=Using+the+course+calendar&st=Blackboard+Learn%E2%84%A2+9.1+Instructor+-+Additional+Features+Training&d=00:02:09&tid=84425&sid=2389'><img src='/webapps/alrn-atomiclearning-bb_bb60/images/icon_play_UnlockedTutorial.png' alt='play icon'>&nbsp;Using the course calendar</a><br/>Duration: (00:02:09)

I'm trying to come up with a regex to capture everything EXCEPT the coherent labels that begin after   and end just before the </a><br/>

So for example, I would capture everything and then delete it and end up only having:

Using the course calendar

as still there. I've tried multiple variations in Rubular but can only get up to the . Trying to use the [^a-zA-Z|^\s]*<\/a>.* to skip every word char and white space up to the <\a> does not work.

Thanks.

I'm just trying to do this in notepad++...to strip everything I don't need to make this SQL query that returned looked neater for the purpose who asked for the report. — Christopher Bruce, Oct 30 '13 at 14:24

Patrick Allwood · Accepted Answer · 2013-10-30T15:10:09.140

1

Using a lookahead and a lookbehind - the two sections in brackets. Modify the character class in the middle to capture everything you want to select.

(?<=> )[a-zA-Z\s]+(?=<\/)

Edit:

([\s\w\d\S\W\D]+)((?<=> )[a-zA-Z\s]+(?=<\/))\K([\s\w\d\S\W\D]+)

Ultimately this creates three match groups, the bit before what you want to be left with, the bit you want to be left with, and the bit after what you want to be left with. I'm not sure how, or if indeed you can, specify to select multiple matches as if it's a single match.
I'd still go with the selecting what you're actually after, if possible.

edited Oct 30 '13 at 15:10

answered Oct 30 '13 at 14:27

Patrick Allwood

1,822
17
21

This is capturing the opposite of what I need, I need everything that isn't the text that this regex captures so I can delete it and be LEFT with just the normal phrases. – Christopher Bruce Oct 30 '13 at 14:51
In your question you state that you want to then delete everything you select and be left with what you are actually after... Why not just actually select the normal phrases? Why is this not appropriate? – Patrick Allwood Oct 30 '13 at 14:56
Because I can't actually do anything with that in Notepad++. I can mark the selections but cannot do a "Copy All Marked". – Christopher Bruce Oct 30 '13 at 14:59
Additionally the provided regex does not capture everything. There are 700 lines to go through and it only gets about 653~. There are some phrases it doesn't catch, for example: Moving with the keyboard pt. 2. I modified the regex to capture that but then it only finds 13 matches and doesn't get the other 600 some odd. It would just be easier I think to find what you know will be there everytime up to the point right after the and right before the ... – Christopher Bruce Oct 30 '13 at 15:06
See edit. Replace the ([\s\w\d\S\W\D]+) groups with a regex to match everything imaginable. – Patrick Allwood Oct 30 '13 at 15:10
I was able to modify the group and mix and match with your edit above to get what I needed. Thanks! – Christopher Bruce Oct 30 '13 at 15:24

Regex to capture everything except the text that is coherent

1 Answers1