Extract certain characters from list and convert them into a character vector

Question

I have a column in my data frame that is a list of characters. This is the column categories

str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   4 obs. of  3 variables:
 $ categories:List of 4
  ..$ : chr  "Tex-Mex" "Mexican" "Fast Food" "Restaurants"
  ..$ : chr  "Hawaiian" "Restaurants" "Barbeque"
  ..$ : chr  "Restaurants" "Italian" "Seafood"
  ..$ : chr  "Restaurants" "Mexican" "American (Traditional)"
 $ name      : chr  "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
 $ type      : chr  "business" "business" "business" "business"

Here is a dput of the first four rows:

structure(list(categories = list(c("Tex-Mex", "Mexican", "Fast Food", 
"Restaurants"), c("Hawaiian", "Restaurants", "Barbeque"), c("Restaurants", 
"Italian", "Seafood"), c("Restaurants", "Mexican", "American (Traditional)"
)), name = c("Taco Bell", "Ohana Hawaiian BBQ", "Carrabba's Italian Grill", 
"Don Tequila"), type = c("business", "business", "business", 
"business")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"), .Names = c("categories", "name", "type"))

I want to extract some of the values from that list so that these values are the only ones that remain in that vector.

For example, I want to filter out all values that are not "Mexican" and not "Restaurants". So the only values that remain says "Mexican" and "Restaurants". To do so I tried this solution:

df_test <- df %>% unnest(categories) %>% 
          filter(str_detect(categories, "Mexican")
                (str_detect(categories, "Restaurants")) %>% 
          nest(categories)

But the result looks like this:

str(df_test)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   4 obs. of  3 variables:
 $ name: chr  "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
 $ type: chr  "business" "business" "business" "business"
 $ data:List of 4
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  1 variable:
  .. ..$ categories: chr  "Mexican" "Restaurants"
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    1 obs. of  1 variable:
  .. ..$ categories: chr "Restaurants"
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    1 obs. of  1 variable:
  .. ..$ categories: chr "Restaurants"
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  1 variable:
  .. ..$ categories: chr  "Restaurants" "Mexican"

The problem is, that after that the column is no character vector like the type column.

Is there a possibility to filter out those characters so that after the procedure the column is a normal character vector like the name and the type column? I don ´t want to replace the values/rows I removed through this procedure. So if there are no "Mexican" or "Restaurants" in a certain row, the row will be removed.

Used packages: dplyr stringr

Please include all packages that you are using...Not just `dplyr` — Sotos, Nov 29 '17 at 13:13
Also, you would probably do better to add a fuller dataset (one or two additional variables) as well as present your desired outcome for that dataset. For example, iIt is unclear whether you want to drop all rows that do not contain "mexican" in this column or whether you want to replace that value with an NA. — lmo, Nov 29 '17 at 13:18
Why are you using `str_detect` there? You can simply do `df %>% unnest() %>% filter(categories == 'Mexican')` — Sotos, Nov 29 '17 at 13:47
I want to filter out more than one value. If I do it this way some rows will be multiplied. — Banjo, Nov 29 '17 at 13:51
So just do `df %>% unnest() %>% filter(categories %in% c('Mexican', 'Restaurants'))` — Sotos, Nov 29 '17 at 14:19
THX, but It´s the same thing. If I use that code the rows that contain both values will be doubled. If I `nest()` the `categories` after that I get the same result as `str(df_test)` — Banjo, Nov 29 '17 at 14:27

manotheshark · Accepted Answer · 2017-11-29T17:27:15.647

1

Using lapply to subset the list

lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])

[[1]]
[1] "Mexican"     "Restaurants"

[[2]]
[1] "Restaurants"

[[3]]
[1] "Restaurants"

[[4]]
[1] "Restaurants" "Mexican"

Adding row with no matching criteria to filter row

df1 <- rbind(df1, c(list("Nothing to match"), "drop me", "business"))
df1$categories <- lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])
df1[sapply(df1$categories, length) > 0, ]

Collapsing list into character string

df1$categories <- sapply(df1$categories, function(x) paste(sort(x[x %in% c("Mexican", "Restaurants")]), collapse=" "))
df1[nchar(df1$categories) > 0, ]

# A tibble: 4 x 3
           categories                     name     type
                <chr>                    <chr>    <chr>
1 Mexican Restaurants                Taco Bell business
2         Restaurants       Ohana Hawaiian BBQ business
3         Restaurants Carrabba's Italian Grill business
4 Mexican Restaurants              Don Tequila business

edited Nov 29 '17 at 17:27

answered Nov 29 '17 at 15:32

manotheshark

4,297
17
30

THX, it works but the vector is still a list, not a character. – Banjo Nov 29 '17 at 17:07
@Banjo do you want to paste all remaining entries into a single string? – manotheshark Nov 29 '17 at 17:16
Yes, the characters should be one vector. – Banjo Nov 29 '17 at 17:20
@Banjo added example to collapse list into a single string – manotheshark Nov 29 '17 at 17:27

Extract certain characters from list and convert them into a character vector

1 Answers1