45

I've got a column people$food that has entries like chocolate or apple-orange-strawberry.

I want to split people$food by - and get the first entry from the split.

In python, the solution would be food.split('-')[0], but I can't find an equivalent for R.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Pistol Pete
  • 1,027
  • 2
  • 12
  • 25

8 Answers8

60

If you need to extract the first (or nth) entry from each split, use:

word <- c('apple-orange-strawberry','chocolate')

sapply(strsplit(word,"-"), `[`, 1)
#[1] "apple"     "chocolate"

Or faster and more explictly:

vapply(strsplit(word,"-"), `[`, 1, FUN.VALUE=character(1))
#[1] "apple"     "chocolate"

Both bits of code will cope well with selecting whichever value in the split list, and will deal with cases that are outside the range:

vapply(strsplit(word,"-"), `[`, 2, FUN.VALUE=character(1))
#[1] "orange" NA  
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 16
    Just as a quick aside, for non-R natives this `sapply(strsplit(word,"-"), `[`, 1)` is just plain unreadable. – Private Oct 14 '19 at 11:30
  • 5
    I guess if you wanted to use a word instead and you were only going for a single value you could do `sapply(strsplit(word,"-"), getElement, 1)` – thelatemail May 22 '20 at 02:08
  • 2
    Using the word 'getElement' adds a LOT in terms of readability (and hence quality). Thanks – Private May 22 '20 at 15:52
29

For example

word <- 'apple-orange-strawberry'

strsplit(word, "-")[[1]][1]
[1] "apple"

or, equivalently

unlist(strsplit(word, "-"))[1].

Essentially the idea is that split gives a list as a result, whose elements have to be accessed either by slicing (the former case) or by unlisting (the latter).

If you want to apply the method to an entire column:

first.word <- function(my.string){
    unlist(strsplit(my.string, "-"))[1]
}

words <- c('apple-orange-strawberry', 'orange-juice')

R: sapply(words, first.word)
apple-orange-strawberry            orange-juice 
                "apple"                "orange"
gented
  • 1,620
  • 1
  • 16
  • 20
  • `strsplit` is vectorised, so there is no need to `sapply` it over each individual item in the vector. See my answer below. – thelatemail Nov 13 '15 at 00:29
22

I would use sub() instead. Since you want the first "word" before the split, we can simply remove everything after the first - and that's what we're left with.

sub("-.*", "", people$food)

Here's an example -

x <- c("apple", "banana-raspberry-cherry", "orange-berry", "tomato-apple")
sub("-.*", "", x)
# [1] "apple"  "banana" "orange" "tomato"

Otherwise, if you want to use strsplit() you can round up the first elements with vapply()

vapply(strsplit(x, "-", fixed = TRUE), "[", "", 1)
# [1] "apple"  "banana" "orange" "tomato"
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Using `sub` function greatly simplifies the possibility of getting the first elements of even multi-dimensional lists after applying the required string patterns, thanks for the nice answer! – Elias Feb 07 '22 at 10:51
8

I would suggest using head rather than [ in R.

word <- c('apple-orange-strawberry','chocolate')
sapply(strsplit(word, "-"), head, 1)
# [1] "apple"     "chocolate"
Ven Yao
  • 3,680
  • 2
  • 27
  • 42
4

dplyr/magrittr approach:

library(magrittr)
library(dplyr)

word = c('apple-orange-strawberry', 'chocolate')

strsplit(word, "-") %>% sapply(extract2, 1)
# [1] "apple"     "chocolate"
Ömer An
  • 600
  • 5
  • 16
3

stringr 1.5.0 introduced str_split_i to do this easily:

library(stringr)

str_split_i(c('apple-orange-strawberry','chocolate'), "-", 1)
[1] "apple"     "chocolate"

The third argument represents the index you want to extract. Also new is that you can use negative values to index from the right:

str_split_i(c('apple-orange-strawberry','chocolate'), "-", -1)
[1] "strawberry" "chocolate" 
LMc
  • 12,577
  • 3
  • 31
  • 43
2

Using str_remove() to delete everything after the pattern:

df <- data.frame(words = c('apple-orange-strawberry', 'chocolate'))

mutate(df, short = stringr::str_remove(words, "-.*")) # mutate method

stringr::str_remove(df$words, "-.*")           # str_remove example

stringr::str_replace(df$words, "-.*", "")      # str_replace method

stringr::str_split_fixed(df$words, "-", n=2)[,1]        # str_split method similar to original question's Python code

tidyr::separate(df, words, into = c("short", NA)) # using the separate function
                    words       short
1 apple-orange-strawberry       apple
2               chocolate   chocolate
M.Viking
  • 5,067
  • 4
  • 17
  • 33
-1

Use map() function.

strsplit(string,sep)

map(strsplit(string,sep),2) #second argument of map() defines the position within the string, given 1,2 3,etc...

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 11 '23 at 17:02