1

I have a column in my data frame that is a list of characters. This is the column categories

str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   4 obs. of  3 variables:
 $ categories:List of 4
  ..$ : chr  "Tex-Mex" "Mexican" "Fast Food" "Restaurants"
  ..$ : chr  "Hawaiian" "Restaurants" "Barbeque"
  ..$ : chr  "Restaurants" "Italian" "Seafood"
  ..$ : chr  "Restaurants" "Mexican" "American (Traditional)"
 $ name      : chr  "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
 $ type      : chr  "business" "business" "business" "business"

Here is a dput of the first four rows:

structure(list(categories = list(c("Tex-Mex", "Mexican", "Fast Food", 
"Restaurants"), c("Hawaiian", "Restaurants", "Barbeque"), c("Restaurants", 
"Italian", "Seafood"), c("Restaurants", "Mexican", "American (Traditional)"
)), name = c("Taco Bell", "Ohana Hawaiian BBQ", "Carrabba's Italian Grill", 
"Don Tequila"), type = c("business", "business", "business", 
"business")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"), .Names = c("categories", "name", "type"))

I want to extract some of the values from that list so that these values are the only ones that remain in that vector.

For example, I want to filter out all values that are not "Mexican" and not "Restaurants". So the only values that remain says "Mexican" and "Restaurants". To do so I tried this solution:

df_test <- df %>% unnest(categories) %>% 
          filter(str_detect(categories, "Mexican")
                (str_detect(categories, "Restaurants")) %>% 
          nest(categories)

But the result looks like this:

str(df_test)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   4 obs. of  3 variables:
 $ name: chr  "Taco Bell" "Ohana Hawaiian BBQ" "Carrabba's Italian Grill" "Don Tequila"
 $ type: chr  "business" "business" "business" "business"
 $ data:List of 4
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  1 variable:
  .. ..$ categories: chr  "Mexican" "Restaurants"
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    1 obs. of  1 variable:
  .. ..$ categories: chr "Restaurants"
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    1 obs. of  1 variable:
  .. ..$ categories: chr "Restaurants"
  ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  1 variable:
  .. ..$ categories: chr  "Restaurants" "Mexican"

The problem is, that after that the column is no character vector like the type column.

Is there a possibility to filter out those characters so that after the procedure the column is a normal character vector like the name and the type column? I don ´t want to replace the values/rows I removed through this procedure. So if there are no "Mexican" or "Restaurants" in a certain row, the row will be removed.

Used packages: dplyr stringr

Banjo
  • 1,191
  • 1
  • 11
  • 28
  • 1
    Please include all packages that you are using...Not just `dplyr` – Sotos Nov 29 '17 at 13:13
  • 1
    Also, you would probably do better to add a fuller dataset (one or two additional variables) as well as present your desired outcome for that dataset. For example, iIt is unclear whether you want to drop all rows that do not contain "mexican" in this column or whether you want to replace that value with an NA. – lmo Nov 29 '17 at 13:18
  • 2
    Why are you using `str_detect` there? You can simply do `df %>% unnest() %>% filter(categories == 'Mexican')` – Sotos Nov 29 '17 at 13:47
  • I want to filter out more than one value. If I do it this way some rows will be multiplied. – Banjo Nov 29 '17 at 13:51
  • So just do `df %>% unnest() %>% filter(categories %in% c('Mexican', 'Restaurants'))` – Sotos Nov 29 '17 at 14:19
  • THX, but It´s the same thing. If I use that code the rows that contain both values will be doubled. If I `nest()` the `categories` after that I get the same result as `str(df_test)` – Banjo Nov 29 '17 at 14:27

1 Answers1

1

Using lapply to subset the list

lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])

[[1]]
[1] "Mexican"     "Restaurants"

[[2]]
[1] "Restaurants"

[[3]]
[1] "Restaurants"

[[4]]
[1] "Restaurants" "Mexican"

Adding row with no matching criteria to filter row

df1 <- rbind(df1, c(list("Nothing to match"), "drop me", "business"))
df1$categories <- lapply(df1$categories, function(x) x[x %in% c("Mexican", "Restaurants")])
df1[sapply(df1$categories, length) > 0, ]

Collapsing list into character string

df1$categories <- sapply(df1$categories, function(x) paste(sort(x[x %in% c("Mexican", "Restaurants")]), collapse=" "))
df1[nchar(df1$categories) > 0, ]

# A tibble: 4 x 3
           categories                     name     type
                <chr>                    <chr>    <chr>
1 Mexican Restaurants                Taco Bell business
2         Restaurants       Ohana Hawaiian BBQ business
3         Restaurants Carrabba's Italian Grill business
4 Mexican Restaurants              Don Tequila business
manotheshark
  • 4,297
  • 17
  • 30