How do I extract out just the player names in r

Question

I have an object called kemba_walker, which contains these characters:

" Kemba Walker PG | #8"

How can I extract out Kemba Walker PG using stringr?

I think I can use kemba_walker %>% str_extract("") but I don't know regex, so I have no idea what pattern to put inside the function!

Can you show few more entries of your data ? What would be the rule to extract the player name ? Do you need first two words of every entry or text before `"|"` or something else ? — Ronak Shah, Sep 26 '19 at 06:00
@HowardBaek OK, but does every entry have only two name words? What about someone with a middle name? Would you then only want to capture the first and middle name? — Tim Biegeleisen, Sep 26 '19 at 06:07
Okay guys, to keep things simple, I want ALL the characters coming before `|`. So, I'd want `Kemba Walker PG` — Howard Baek, Sep 26 '19 at 06:10
That way, I can capture the players with a middle name, as @TimBiegeleisen mentioned — Howard Baek, Sep 26 '19 at 06:11
@akrun But, just because the OP requested the `stringr` library does not mean that using it is the best answer here. — Tim Biegeleisen, Sep 26 '19 at 06:20
@TimBiegeleisen I agree that, but the OP mentioned in two places about using `stringr` — akrun, Sep 26 '19 at 06:21
It is a pipe delimited text, so import as such: `read.table(text = " Kemba Walker PG | #8", sep = "|", strip.white = TRUE)` — zx8754, Sep 26 '19 at 06:59

Tim Biegeleisen · Answer 1 · 2019-09-26T06:08:47.780

2

You may use the pipe as the marker for how to find the player name:

input <- "  Kemba Walker PG  |  #8"
name <- sub("^\\s*(.*?)\\s*\\|.*$", "\\1", input)
name

[1] "Kemba Walker PG"

The regex works by capturing the player name in (.*?). The replacement, which is the second argument to sub, is \1, which is the capture group.

edited Sep 26 '19 at 06:08

answered Sep 26 '19 at 06:00

Tim Biegeleisen

What does `"\\1"` mean? – Howard Baek Sep 26 '19 at 06:07
1

@akrun It's never always clear to me whether something is an _exact_ duplicate. There are many more duplicates from what I can see on other tags, such as regex and SQL. – Tim Biegeleisen Sep 26 '19 at 06:50
I was also hesitant first because the OP mentioned a stringr solution, but then II find exact pattern with `sub`. But, closing a question with dupe increases chance of finding it through google – akrun Sep 26 '19 at 06:52

akrun · Answer 2 · 2019-09-26T16:09:18.727

We can use str_remove from stringr to remove the character | followed by other characters .*

library(stringr)
trimws(str_remove(str1, "\\|.*"))
#[1] "Kemba Walker PG"

Or using str_extract to extract characters other than | from the start (^) of the string

trimws(str_extract(str1, "^[^|]+")
#[1] "Kemba Walker PG"

Or in base R with only trimws

trimws(str1,  whitespace = "\\s*[|].*|\\s*", which = 'both')
#[1] "Kemba Walker PG"

str1 <- "  Kemba Walker PG  |  #8"

Ronak Shah · Accepted Answer · 2019-09-26T07:12:06.847

2

We can use sub to remove everything after "|"

vec <- "  Kemba Walker PG  |  #8"

trimws(sub("\\|.*", "", vec))
#[1] "Kemba Walker PG"

As @zx8754 mentions we can also use read.table

read.table(text = vec, sep = "|", strip.white = TRUE)

edited Sep 26 '19 at 07:12

answered Sep 26 '19 at 06:13

Ronak Shah

3 Answers3