First entry from string split

Question

I've got a column people$food that has entries like chocolate or apple-orange-strawberry.

I want to split people$food by - and get the first entry from the split.

In python, the solution would be food.split('-')[0], but I can't find an equivalent for R.

score 60 · Accepted Answer · answered Nov 13 '15 at 00:28

60

If you need to extract the first (or nth) entry from each split, use:

word <- c('apple-orange-strawberry','chocolate')

sapply(strsplit(word,"-"), `[`, 1)
#[1] "apple"     "chocolate"

Or faster and more explictly:

vapply(strsplit(word,"-"), `[`, 1, FUN.VALUE=character(1))
#[1] "apple"     "chocolate"

Both bits of code will cope well with selecting whichever value in the split list, and will deal with cases that are outside the range:

vapply(strsplit(word,"-"), `[`, 2, FUN.VALUE=character(1))
#[1] "orange" NA

answered Nov 13 '15 at 00:28

thelatemail

91,185
12
128
188

16

Just as a quick aside, for non-R natives this `sapply(strsplit(word,"-"), `[`, 1)` is just plain unreadable. – Private Oct 14 '19 at 11:30
5

I guess if you wanted to use a word instead and you were only going for a single value you could do `sapply(strsplit(word,"-"), getElement, 1)` – thelatemail May 22 '20 at 02:08
2

Using the word 'getElement' adds a LOT in terms of readability (and hence quality). Thanks – Private May 22 '20 at 15:52

gented · Answer 2 · 2015-11-13T00:28:31.317

29

For example

word <- 'apple-orange-strawberry'

strsplit(word, "-")[[1]][1]
[1] "apple"

or, equivalently

unlist(strsplit(word, "-"))[1].

Essentially the idea is that split gives a list as a result, whose elements have to be accessed either by slicing (the former case) or by unlisting (the latter).

If you want to apply the method to an entire column:

first.word <- function(my.string){
    unlist(strsplit(my.string, "-"))[1]
}

words <- c('apple-orange-strawberry', 'orange-juice')

R: sapply(words, first.word)
apple-orange-strawberry            orange-juice 
                "apple"                "orange"

edited Nov 13 '15 at 00:28

answered Nov 13 '15 at 00:22

gented

1,620
1
16
20

`strsplit` is vectorised, so there is no need to `sapply` it over each individual item in the vector. See my answer below. – thelatemail Nov 13 '15 at 00:29

score 22 · Answer 3 · answered Nov 13 '15 at 00:25

I would use sub() instead. Since you want the first "word" before the split, we can simply remove everything after the first - and that's what we're left with.

sub("-.*", "", people$food)

Here's an example -

x <- c("apple", "banana-raspberry-cherry", "orange-berry", "tomato-apple")
sub("-.*", "", x)
# [1] "apple"  "banana" "orange" "tomato"

Otherwise, if you want to use strsplit() you can round up the first elements with vapply()

vapply(strsplit(x, "-", fixed = TRUE), "[", "", 1)
# [1] "apple"  "banana" "orange" "tomato"

Using `sub` function greatly simplifies the possibility of getting the first elements of even multi-dimensional lists after applying the required string patterns, thanks for the nice answer! — Elias, Feb 07 '22 at 10:51

score 8 · Answer 4 · answered Nov 13 '15 at 01:11

8

I would suggest using head rather than [ in R.

word <- c('apple-orange-strawberry','chocolate')
sapply(strsplit(word, "-"), head, 1)
# [1] "apple"     "chocolate"

answered Nov 13 '15 at 01:11

Ven Yao

3,680
2
27
42

That is a bit more readable! – akorejwa Apr 26 '23 at 18:42

score 4 · Answer 5 · answered Aug 30 '18 at 07:12

4

dplyr/magrittr approach:

library(magrittr)
library(dplyr)

word = c('apple-orange-strawberry', 'chocolate')

strsplit(word, "-") %>% sapply(extract2, 1)
# [1] "apple"     "chocolate"

answered Aug 30 '18 at 07:12

Ömer An

600
5
16

score 3 · Answer 6 · answered Feb 15 '23 at 17:19

stringr 1.5.0 introduced str_split_i to do this easily:

library(stringr)

str_split_i(c('apple-orange-strawberry','chocolate'), "-", 1)
[1] "apple"     "chocolate"

The third argument represents the index you want to extract. Also new is that you can use negative values to index from the right:

str_split_i(c('apple-orange-strawberry','chocolate'), "-", -1)
[1] "strawberry" "chocolate"

M.Viking · Answer 7 · 2022-10-18T18:28:14.530

Using str_remove() to delete everything after the pattern:

df <- data.frame(words = c('apple-orange-strawberry', 'chocolate'))

mutate(df, short = stringr::str_remove(words, "-.*")) # mutate method

stringr::str_remove(df$words, "-.*")           # str_remove example

stringr::str_replace(df$words, "-.*", "")      # str_replace method

stringr::str_split_fixed(df$words, "-", n=2)[,1]        # str_split method similar to original question's Python code

tidyr::separate(df, words, into = c("short", NA)) # using the separate function

                    words       short
1 apple-orange-strawberry       apple
2               chocolate   chocolate

score -1 · Answer 8 · answered Apr 10 '23 at 18:18

-1

Use map() function.

strsplit(string,sep)

map(strsplit(string,sep),2) #second argument of map() defines the position within the string, given 1,2 3,etc...

answered Apr 10 '23 at 18:18

P-polycephalum

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 11 '23 at 17:02

First entry from string split

8 Answers8

Linked

Related