We can use sub
to match the metacharacter (|
) followed by rest of the characters to the end of the string (.*
), and replace it with ""
.
sub("\\|.*", "", str1)
#[1] "Application Games" "E-Commerce"
This can also be done with capture groups to match all characters that are not |
, capture as a group and in the replacement use the backreference for that group
sub("^([^|]+)\\|.*", "\\1", str1)
#[1] "Application Games" "E-Commerce"
If we need a package solution, str_extract
can be used as well
library(stringr)
str_extract(str1, "[^|]+")
#[1] "Application Games" "E-Commerce"
Or using word
word(str1, 1, sep="[|]")
#[1] "Application Games" "E-Commerce"
NOTE: Here also, I showed compact code as well as base R
methods without splitting or looping
Benchmarks
str2 <- rep(str1, 1e5)
system.time(sub("\\|.*", "", str2) )
# user system elapsed
# 0.20 0.00 0.21
system.time(str_extract(str2, "[^|]+") )
# user system elapsed
# 0.08 0.00 0.08
system.time({
l <- strsplit(str2,"\\|")
sapply(1:length(l), function(i) l[[i]][1])
})
# user system elapsed
# 0.5 0.0 0.5
data
str1 <- c("Application Games|Real Time|Social Media",
"E-Commerce|Cryogenesis|Real Estate")