I am running into some problems doing text processing using dplyr and stringr functions (specifically str_split()). I think I am misunderstanding something very fundamental about how to use dplyr correctly when dealing with elements that are vectors/lists.
Here's a tibble, df...
library(tidyverse)
df <- tribble(
~item, ~phrase,
"one", "romeo and juliet",
"two", "laurel and hardy",
"three", "apples and oranges and pears and peaches"
)
Now I create a new column, splitPhrase, by doing str_split() on one of the columns using "and" as the delimiter.
df <- df %>%
mutate(splitPhrase = str_split(phrase,"and"))
That seems to work, sort-of, in RStudio I see this...
In the console I see that my new column, splitPhrase, is actually composed of list... but it looks correct in the Rstudio display, right?
df
#> # A tibble: 3 x 3
#> item phrase splitPhrase
#> <chr> <chr> <list>
#> 1 one romeo and juliet <chr [2]>
#> 2 two laurel and hardy <chr [2]>
#> 3 three apples and oranges and pears and peaches <chr [4]>
What I ultimately want to do is to extract the last item of each splitPhrase. In other words, I'd like to get to this...
The problem is I can't see how to just grab the last element in each splitPhrase. If it were just a vector, I could do something like this...
#> last( c("a","b","c") )
#[1] "c"
#>
But that doesn't work within the tibble, neither does other things that come to mind:
df <- df %>%
mutate(lastThing = last(splitPhrase))
# Error in mutate_impl(.data, dots) :
# Column `lastThing` must be length 3 (the number of rows) or one, not 4
df <- df %>% group_by(splitPhrase) %>%
mutate(lastThing = last(splitPhrase))
# Error in grouped_df_impl(data, unname(vars), drop) :
# Column `splitPhrase` can't be used as a grouping variable because it's a list
So, I think I am "not getting" how to work with vectors that are inside an element in table/tibble column. It seems to have something to do with the fact that in my example it's actually a list of vectors.
Is there a particular function that will help me out here, or a better way of getting to this?
Created on 2018-09-27 by the reprex package (v0.2.1)