17

The Rust regex crate states:

This crate provides a native implementation of regular expressions that is heavily based on RE2 both in syntax and in implementation. Notably, backreferences and arbitrary lookahead/lookbehind assertions are not provided.

As of this writing, "rust regex lookbehind" comes back with no results from DuckDuckGo.

I've never had to work around this before, but I can think of two approaches:

Approach 1 (forward)

  1. Iterate over .captures() for the pattern I want to use as lookbehind.
  2. Match the thing I actually wanted to match between captures. (forward)

Approach 2 (reverse)

  1. Match the pattern I really want to match.
  2. For each match, look for the lookbehind pattern until the end byte of a previous capture or the beginning of the string.

Not only does this seem like a huge pain, it also seems like a lot of edge cases are going to trip me up. Is there a better way to go about this?

Example

Given a string like:

"Fish33-Tiger2Hyena4-"

I want to extract ["33-", "2", "4-"] iff each one follows a string like "Fish".

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
bright-star
  • 6,016
  • 6
  • 42
  • 81
  • 1
    Why not use `[0-9]+-?`? The best method to emulate a lookbehind (when you need it) is using optional capturing groups. – Wiktor Stribiżew Jun 22 '16 at 16:14
  • 2
    Could you please think of a more appropriate example? – Wiktor Stribiżew Jun 22 '16 at 16:19
  • @WiktorStribiżew how does optional capturing groups emulate a lookbehind? the idea of a lookbehind is see if it's there and if it is then don't match the letters themselves just the position. Surely an optional capture group would match the letters/characters when they exist? – barlop Jun 22 '16 at 16:20
  • 1
    I suppose you could emulate a lookbehind, by Matching but not capturing. – barlop Jun 22 '16 at 16:21
  • Yes, but then you can check if the group matched or not. If it matched, there is the text. If not, there is no such text. Sure, there are limitations and it is more like a `\K` workaround. – Wiktor Stribiżew Jun 22 '16 at 16:21
  • @WiktorStribiżew you write "there is the text". <-- He or a person using a lookbehind, Does Not Want The Text *ever* Hence The LookBehind. – barlop Jun 22 '16 at 16:22
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115331/discussion-between-barlop-and-wiktor-stribizew). – barlop Jun 22 '16 at 16:23
  • 1
    *iff each one follows a string like "Fish"* — perhaps you can describe what defines "like Fish"? Maybe add something to the input string that **shouldn't** be matched but would be matched by `[0-9]+-?`. Right now, it seems like `[A-Z][a-z]+([0-9]+-?)` and grabbing the grouped expression would work. – Shepmaster Jun 22 '16 at 18:08
  • @Shepmaster I was actively trying to ask a question more general than a specific regex pattern issue, since I wanted to know what to do about this in Rust in general, not just for the particular application of a lookbehind I ran into. I put the (weak) regex example in only because the question form was urging me to do so. – bright-star Jun 22 '16 at 18:11
  • 2
    And I think a general question is fine, but they are usually driven by concrete examples. So far, it appears that the "real answer" to your question is "Don't emulate this behavior because you don't need it". Presumably that's not your ideal end state. The question form encourages you to do so *because it's useful* and would have prevented you from having to deal with most of these comments ^^. – Shepmaster Jun 22 '16 at 18:14

2 Answers2

16

Without a motivating example, it's hard to usefully answer your question in a general way. In many cases, you can substitute lookaround operators with two regexes---one to search for candidates and another to produce the actual match you're interested in. However, this approach isn't always feasible.

If you're truly stuck, then you're only option is to use a regex library that supports these features. Rust has bindings to a couple of them:

There is also a more experimental library, fancy-regex, which is built on top of the regex crate.

BurntSushi5
  • 13,917
  • 7
  • 52
  • 45
3

If you have a regex application where you have a known consistent pattern that you want to use as lookbehind, another workaround is to use .splits() with the lookbehind-matching pattern as the argument (similar to the idea mentioned in the other answer). That will at least give you strings expressed by their adjacency to the match you want to lookbehind.

I don't know about performance guarantees regex-wise but this at least means that you can do a lookbehind-free regex match on the split result either N times (for N splits), or once on the concatenated result as needed.

Community
  • 1
  • 1
bright-star
  • 6,016
  • 6
  • 42
  • 81
  • I'm not usually an advocate for commenting when you downvote, but I'm not sure why this answer is "not useful" in someone's opinion. If I knew advanced regex concepts like lookbehind better, I'd weigh in with my own voting. ^_^ – Shepmaster Jul 11 '16 at 18:44
  • I have a gut feeling that this approach is A Bad Idea but I too want to know why ;) – bright-star Jul 11 '16 at 18:52
  • Presumably because someone felt that this was not "*the **most** sensible way to emulate lookbehind behavior in Rust regex*" – JDB Dec 11 '17 at 19:41