-1

I want to remove all numbers that are immediately followed by a ). My Strings look like this:

Gmünd 5) 6) 7)
Hermagor am See 3)

So I'd like to have the result:

Gmünd
Hermagor

I think the solution must involve negative lookaheads, but I am not really sure how to do that.

Lenn
  • 1,283
  • 7
  • 20
  • Sorry I have been unclear! This is what I meant. There might be spaces in the names, so trimming everything after the first white space does not work :/ – Lenn Mar 10 '22 at 09:55
  • What's the logic exactly? Why does the second line lost "am See" if you're just removing numbers and parentheses? – camille Mar 11 '22 at 03:56

1 Answers1

2

If you have no other digits in your strings a lookaround is not needed. If you do however, especially in the context of ( and ), then lookaround, specifically negative lookbehind, is needed:

gsub("(?<!\\()\\s?\\d+\\)", "", strings, perl = TRUE)
[1] "Gmünd"            "Hermagor"         "Tegernsee (4)"    "Some (stuff) 444"

How this works:

  • (?<!\\() negative lookbehind to assert that there is not a literal ( immediately prior to ...
  • \\s?\\d+\\) ... an optional space, followed by one or more digits, followed by a literal )

Data:

strings <- c("Gmünd 5) 6) 7)", "Hermagor 3)", "Tegernsee (4)", "Some (stuff) 444")
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • I think this is it!! Thank you so much!:) I know this is a little annoying, but could you maybe, very briefly explain, what the regex does ? I see that the `(?<! ... )` part is the native lookbehind. I thought a lookahead might be necessary as my reasoning was: "Match everything that is not followed by any number with a parantheses right after". So I do not get completely why a negative lookbehind is necessary here. Especially because there is nothing "after" the lookbehind. Like `(?<!a)b`. – Lenn Mar 10 '22 at 10:03
  • 1
    Have edited answer to include a short explanation. – Chris Ruehlemann Mar 10 '22 at 16:51