I have a column in my data frame that is a list of characters. This is the column categories
str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables:
$ categories:List of 4
..$ : chr "Tex-Mex" "Mexican" "Fast Food" "Restaurants"
..$ : chr "Hawaiian" "Restaurants" "Barbeque"
..$ : chr "Restaurants" "Italian" "Seafood"
..$ : chr "Restaurants" "Mexican" "American (Traditional)"
$ name : chr "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
$ type : chr "business" "business" "business" "business"
Here is a dput
of the first four rows:
structure(list(categories = list(c("Tex-Mex", "Mexican", "Fast Food",
"Restaurants"), c("Hawaiian", "Restaurants", "Barbeque"), c("Restaurants",
"Italian", "Seafood"), c("Restaurants", "Mexican", "American (Traditional)"
)), name = c("Taco Bell", "Ohana Hawaiian BBQ", "Carrabba's Italian Grill",
"Don Tequila"), type = c("business", "business", "business",
"business")), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"), .Names = c("categories", "name", "type"))
I want to extract some of the values from that list so that these values are the only ones that remain in that vector.
For example, I want to filter out all values that are not "Mexican" and not "Restaurants". So the only values that remain says "Mexican" and "Restaurants". To do so I tried this solution:
df_test <- df %>% unnest(categories) %>%
filter(str_detect(categories, "Mexican")
(str_detect(categories, "Restaurants")) %>%
nest(categories)
But the result looks like this:
str(df_test)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables:
$ name: chr "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
$ type: chr "business" "business" "business" "business"
$ data:List of 4
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
.. ..$ categories: chr "Mexican" "Restaurants"
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 1 variable:
.. ..$ categories: chr "Restaurants"
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 1 variable:
.. ..$ categories: chr "Restaurants"
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
.. ..$ categories: chr "Restaurants" "Mexican"
The problem is, that after that the column is no character vector like the type
column.
Is there a possibility to filter out those characters so that after the procedure the column is a normal character vector like the name
and the type
column?
I don ´t want to replace the values/rows I removed through this procedure. So if there are no "Mexican" or "Restaurants" in a certain row, the row will be removed.
Used packages:
dplyr
stringr