0

I have an object called kemba_walker, which contains these characters:

" Kemba Walker PG | #8"

How can I extract out Kemba Walker PG using stringr?

I think I can use kemba_walker %>% str_extract("") but I don't know regex, so I have no idea what pattern to put inside the function!

3 Answers3

2

You may use the pipe as the marker for how to find the player name:

input <- "  Kemba Walker PG  |  #8"
name <- sub("^\\s*(.*?)\\s*\\|.*$", "\\1", input)
name

[1] "Kemba Walker PG"

The regex works by capturing the player name in (.*?). The replacement, which is the second argument to sub, is \1, which is the capture group.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • What does `"\\1"` mean? – Howard Baek Sep 26 '19 at 06:07
  • 1
    @akrun It's never always clear to me whether something is an _exact_ duplicate. There are many more duplicates from what I can see on other tags, such as regex and SQL. – Tim Biegeleisen Sep 26 '19 at 06:50
  • I was also hesitant first because the OP mentioned a stringr solution, but then II find exact pattern with `sub`. But, closing a question with dupe increases chance of finding it through google – akrun Sep 26 '19 at 06:52
2

We can use str_remove from stringr to remove the character | followed by other characters .*

library(stringr)
trimws(str_remove(str1, "\\|.*"))
#[1] "Kemba Walker PG"

Or using str_extract to extract characters other than | from the start (^) of the string

trimws(str_extract(str1, "^[^|]+")
#[1] "Kemba Walker PG"

Or in base R with only trimws

trimws(str1,  whitespace = "\\s*[|].*|\\s*", which = 'both')
#[1] "Kemba Walker PG"

data

str1 <- "  Kemba Walker PG  |  #8"
akrun
  • 874,273
  • 37
  • 540
  • 662
2

We can use sub to remove everything after "|"

vec <- "  Kemba Walker PG  |  #8"

trimws(sub("\\|.*", "", vec))
#[1] "Kemba Walker PG"

As @zx8754 mentions we can also use read.table

read.table(text = vec, sep = "|", strip.white = TRUE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213