5

I have data that looks like this:

df <- tribble(
    ~name, ~value,
    "Jake Lake MLP", 10, 
    "Bay May CE", 5,
    "Drake Cake Jr. DSF", 9.1,
    "Sam Ram IR QQQZ", 1
)

I want to trim all the names so that they are:

"Jake Lake",
"Bay May", 
"Drake Cake Jr.",
"Sam Ram IR"

Basically removing everything after the last space.

I tried:

df %>% mutate(name = str_replace(name, "\\s.*$", ""))

But it's not quite what I want!

emehex
  • 9,874
  • 10
  • 54
  • 100
  • 4
    You could actually take the regex from the non-accepted answer at http://stackoverflow.com/questions/20497895/regular-expression-in-r-to-remove-the-part-of-a-string-after-the-last-space – Wiktor Stribiżew Oct 19 '16 at 22:51

1 Answers1

8

We can use sub

df %>% 
    mutate(name = sub("\\s+[^ ]+$", "", name))

Or the same pattern in str_replace

df %>% 
   mutate(name = str_replace(name, "\\s[^ ]+$", ""))
# A tibble: 4 × 2
#            name value
#           <chr> <dbl>
#1      Jake Lake  10.0
#2        Bay May   5.0
#3 Drake Cake Jr.   9.1
#4     Sam Ram IR   1.0

The pattern indicates a space (\\s) followed by one or more non white space (otherwise it can \\S+) until the end of the string and replace it with blank "". In the OP's code, it was non-specific (.*).

akrun
  • 874,273
  • 37
  • 540
  • 662
  • could you explain whats going on with the `+[^ ]+` in the regex? I understand \\s and $.... just not the middle piece. – emehex Oct 19 '16 at 22:45
  • @emehex Whenever we use `^` inside square brackets and also have another character (here it is space), it means to match any character except the space (here) – akrun Oct 19 '16 at 22:47
  • Gotcha. I don't like the recycled `^` ... I just knew it as the opposite of `$`. – emehex Oct 19 '16 at 22:48
  • @emehex If you use it as standalone, it means the start of the string. So its meaning differs – akrun Oct 19 '16 at 22:49