1

I'm looking to add a column that is renamed based upon the value of a string in the same row.

For example, how could I to create a new column that shows the number or text at the very end of PlayerID in this table? As such, I want this:

PlayerID           
Hank Aaron + 7      
Babe Ruth + 5       
Ted Williams + 2i   
Hank Aaron + Outfield
Lou Gehrig + FirstBase

To become this:

PlayerID                 NewColumn 
Hank Aaron + 7            7 
Babe Ruth + 5             5 
Ted Williams + 2i         2i 
Hank Aaron + Outfield     Outfield 
Lou Gehrig + FirstBase    FirstBase

As you can see above, I need everything after the plus sign to be included in the new column. Sometimes the value after the plus sign is a number, sometimes it is characters and a number, and sometimes it is just characters. Thanks in advance!

Ric S
  • 9,073
  • 3
  • 25
  • 51
887
  • 599
  • 3
  • 15
  • 1
    Essentially very similar logic to this one, though maybe not exactly a duplicate: https://stackoverflow.com/questions/10617702/remove-part-of-string-after , possibly this one instead: https://stackoverflow.com/questions/25991824/remove-all-characters-before-a-period-in-a-string – thelatemail Jul 17 '20 at 05:54

3 Answers3

3

You can use regex to capture everything after the plus (+) sign :

df$newcol <- sub('.*\\+\\s*(.*)$', '\\1', df$PlayerID)
df$newcol
#[1] "7"         "5"         "2i"        "Outfield"  "FirstBase"

Or the opposite, instead of capturing remove everything till "+".

sub('.*\\+\\s*', '', df$PlayerID)

If there is only one word after + you can also use stringr::word with no regex to get last word.

stringr::word(df$PlayerID, -1)

data

df <- structure(list(PlayerID = c("Hank Aaron + 7", "Babe Ruth + 5", 
"Ted Williams + 2i", "Hank Aaron + Outfield", "Lou Gehrig + FirstBase"
)), class = "data.frame", row.names = c(NA, -5L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

If you have only one plus sign in the PlayerID column, you can combine sapply and strsplit in base R

df$NewColumn <- sapply(strsplit(df$PlayerID, split = " + ", fixed = TRUE), function(x) x[[2]])

df
#                 PlayerID NewColumn
# 1         Hank Aaron + 7         7
# 2          Babe Ruth + 5         5
# 3      Ted Williams + 2i        2i
# 4  Hank Aaron + Outfield  Outfield
# 5 Lou Gehrig + FirstBase FirstBase
Ric S
  • 9,073
  • 3
  • 25
  • 51
0

Here is a strategy with tidyverse.

library(tidyverse)

PlayerID <- c(
"Hank Aaron + 7",
"Babe Ruth + 5",       
"Ted Williams + 2i",  
"Hank Aaron + Outfield",
"Lou Gehrig + FirstBase"
)

df <- data.frame(PlayerID, stringsAsFactors = F)
df %>% 
  separate(PlayerID,into = c('Player', 'a', 'newColumn'), fill = 'right') %>% 
  unite('Name',Player:a, remove = F, sep = ' ') %>% 
  select(-c(Player:a))
#>           Name newColumn
#> 1   Hank Aaron         7
#> 2    Babe Ruth         5
#> 3 Ted Williams         2i
#> 4   Hank Aaron  Outfield
#> 5   Lou Gehrig FirstBase
MarBlo
  • 4,195
  • 1
  • 13
  • 27