2

I give up. I do not know regex, and I've spent the last 2 days trying to glom enough just to perform 1 simple task. So I'll suffer all the downvotes to ask this simple (some might say stupid) question.

I have a string that looks like this: path/to/the/file/text_I_want_tokeep_loremipsumdolorsitamet

In other words, I want what's between the 4th "/" and the 4th "_".

An answer with no explanation is greatly appreciated. An answer with an explanation is appreciated even more. :)

Thanks!

Steve
  • 575
  • 4
  • 18
  • So what have you tried so far? – JuanR Sep 22 '17 at 15:56
  • all i've tried is finding "similar" questions here on S.O., then randomly adapting the answers and applying to my data, trying to figure out what does what. I just now reached the conclusion that I couldn't figure it out; I ned to start on page 1 of a Regex tutorial. I'll do that, but I need to solve this problem faster than that will take. – Steve Sep 22 '17 at 15:58
  • @Steve Do you want `loremipsumdolorsitamet` as your output? you should use `explode` or `split` or an equivalent method according to the language you are using. – Sahil Gulati Sep 22 '17 at 16:00
  • @Sahil: no, sorry. I wasn't clear. I want ```text_I_want_tokeep``` – Steve Sep 22 '17 at 16:00
  • 2
    Acc. to your requirements, [this](https://regex101.com/r/bwh9uN/1) should suffice. Judging by your example string, [this can be even simpler](https://regex101.com/r/bwh9uN/2). – Wiktor Stribiżew Sep 22 '17 at 16:01
  • Thanks Wiktor!! I wish I could give you an extra upvote for introducing me to regex101.com!! – Steve Sep 22 '17 at 16:02

2 Answers2

2

This should work:

.+/(.+)_.+

It's basically skipping any characters until it finds the slash, captures the next portion until it reaches an underscore that has other stuff after it.

JuanR
  • 7,405
  • 1
  • 19
  • 30
  • Thanks Juan! I now have THREE answers, all with explanations, and they all seem to work. I'll learn a lot from this one. Really appreciate it! – Steve Sep 22 '17 at 16:07
  • 1
    @Steve: When studying this pattern, pay attention to [*backtracking*](https://stackoverflow.com/questions/9011592/in-regular-expressions-what-is-a-backtracking-back-referencing). – Wiktor Stribiżew Sep 22 '17 at 16:20
  • @Sahil, @Juan: this answer actually gives me ```text_I_want_tokeep_loremipsumdolorsitamet``` instead of ```text_I_want_tokeep``` – Steve Sep 22 '17 at 16:27
  • I don't think so @Steve. I tested it myself. :-) – JuanR Sep 22 '17 at 17:30
  • I see my answer was edited. I am not sure why the change was accepted but the slash does NOT need to be escaped. – JuanR Sep 22 '17 at 17:32
  • Thanks everyone. It'll take me a while to understand all these answers. I'm going to finish the task first, and then come back to understanding it. You all have given me some excellent customized learning material. Thanks. – Steve Sep 22 '17 at 17:34
1

You can try something like this.

Regex demo

Regex: ^(?:[^\/]+\/){4}\K(?:[^_]+_){3}[^_]+

1. ^ means starting of string.

2. (?:[^\/]+\/){4} This [^\/]+\/ , Here [^\/]+ this will match all till / and \/ will match / and {4} is for matching this pattern four times.

3. \K this will reset current match.

4. (?:[^_]+_){3}[^_]+, Here [^_]+ this will match all except _ and {3} for three times.


Note: Make sure few languages does not support \K in that case you can put () parenthesis around the expression written after \K to make it a capturing group.

Community
  • 1
  • 1
Sahil Gulati
  • 15,028
  • 4
  • 24
  • 42
  • @Steve Try this one. – Sahil Gulati Sep 22 '17 at 16:35
  • 1
    Thanks Sahil. I'm going to accept this one. It works in regex101. And thanks especially for the very clear explanation, which will help me learn. I now have to figure out why stringr (R package) won't accept that pattern, but that's a whole different problem, and one I can figure out on my own! :) – Steve Sep 22 '17 at 16:46
  • @Steve Welcome.. My friend.. Glad to help you.. :) – Sahil Gulati Sep 22 '17 at 16:48
  • @Steve: In R, use `regmatches(x, regexpr("^(?:[^/]+/){4}\\K(?:[^_]+_){3}[^_]+", x, perl=TRUE))`. Or `str_match(x, "^(?:[^/]+/){4}((?:[^_]+_){3}[^_]+)")` – Wiktor Stribiżew Sep 22 '17 at 17:13