Extract string before "|"

Question

I have a data set wherein a column looks like this:

ABC|DEF|GHI,  
ABCD|EFG|HIJK,  
ABCDE|FGHI|JKL,  
DEF|GHIJ|KLM,  
GHI|JKLM|NO|PQRS,  
BCDE|FGHI|JKL

.... and so on

I need to extract the characters that appear before the first | symbol.

In Excel, we would use a combination of MID-SEARCH or a LEFT-SEARCH, R contains substr().

The syntax is - substr(x, <start>,<stop>)

In my case, start will always be 1. For stop, we need to search by |. How can we achieve this? Are there alternate ways to do this?

`?regexpr` returns the index of the first match that can be used as your "stop" argument -- `regexpr("|", x, fixed = TRUE) - 1` — alexis_laz, Jul 10 '16 at 12:42

akrun · Answer 1 · 2016-07-10T20:19:02.970

67

We can use sub

sub("\\|.*", "", str1)
#[1] "ABC"

Or with strsplit

strsplit(str1, "[|]")[[1]][1]
#[1] "ABC"

Update

If we use the data from @hrbrmstr

sub("\\|.*", "", df$V1)
#[1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE"

These are all base R methods. No external packages used.

data

str1 <- "ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL"

edited Jul 10 '16 at 20:19

answered Jul 10 '16 at 12:21

akrun

874,273
37
540
662

score 25 · Answer 2 · answered Jul 10 '16 at 18:11

25

Another option word function of stringr package

library(stringr)
word(df1$V1,1,sep = "\\|")

Data

df1 <- read.table(text = "ABC|DEF|GHI,  
ABCD|EFG|HIJK,  
ABCDE|FGHI|JKL,  
DEF|GHIJ|KLM,  
GHI|JKLM|NO|PQRS,  
BCDE|FGHI|JKL")

answered Jul 10 '16 at 18:11

user2100721

3,557
2
20
29

I especially like this package's ability to get, for example, the first two "words". – Nova Sep 24 '18 at 13:48
so simple - thank you! – amc Feb 15 '20 at 18:40

score 5 · Answer 3 · answered Jul 10 '16 at 14:43

with stringi:

library(stringi)

df <- read.table(text="ABC|DEF|GHI,1
ABCD|EFG|HIJK,2
ABCDE|FGHI|JKL,3  
DEF|GHIJ|KLM,4
GHI|JKLM|NO|PQRS,5
BCDE|FGHI|JKL,6", sep=",", header=FALSE, stringsAsFactors=FALSE)

stri_match_first_regex(df$V1, "(.*?)\\|")[,2]
## [1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE"

Extract string before "|"

3 Answers3

Update

data

Linked

Related