0

Starting with the following string:

and worda1 worda2 ... wordan and wordb1 wordb2 ... wordbn

The ... is not literal, but means that other words could also be there. And the words could be anything but 'and'.

I'd like to capture

wordb1 wordb2 wordbn

The problem is with the regex's I've written so far is that I've used \w, which then matches the 'and' and results in a greedy capture. Lookahead and lookbehind don't work either because of the arbitrary number of words that need to be captured.

Edit: here's an example:

and everyone went to the park and nobody was left at home

should capture:

nobody was left at home

The regex cannot hardcode the phrase "nobody was left at home", because it needs to capture any arbitrary sequence of words other than "and".

Even better:

and it was morning and everyone went to the park and nobody was left at home

should capture:

nobody was left at home

The big picture is that I'd like to only capture only up to the first "and", starting from the right.

I could write some code to do this, but wondering if there is a regex way to do this.

I'm using Python re, but open to other flavors of regex.

Thanks for any help.

  • Not quite sure what your question is just yet.... Could the regex you're looking for be as simple as capture all words after the second 'and'? If not, can you explain further what you're doing. – John Bustos Oct 26 '16 at 19:27
  • 1
    Could you formulate the requirements? Match consecutive words that have a numeric suffix that is incremented successively? Then no regex will help. – Wiktor Stribiżew Oct 26 '16 at 19:28
  • what do you mean by `word` if it is the literal string you could use `word\d*` – Federico Piazza Oct 26 '16 at 19:29
  • I added a couple examples. As I said above, word = anything but "and". – user3750352 Oct 26 '16 at 21:36

1 Answers1

0

This should do it:

/(?:.* and )?(.+)/

Note that this matches the entire line, but the captured match will be the part you want. A working example is here.

There are a few caveats though:

  1. This assumes that there is one sentence per line.
  2. This will match an entire line when it doesn't have the word 'and' in it. Perhaps that's what you want though.
  3. This assumes the very first word of the line isn't 'and'.
  4. This also assumes the very last word of the line isn't 'and'.
seane
  • 589
  • 1
  • 4
  • 15